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Summary of Progress 


During the period December 1, 1987 - May 31, 1988, progress was made in the following 
areas: 

1) Construction of Multi-Dimensional Bandwidth Efficient Trellis Codes with MPSK Modu- 
lation. 

Multi-dimensional trellis coded modulation schemes using either 8PSK or 16PSK modulation 
appear to have great promise for achieving high data dates on satellite communication channels. 
Work by Ungerboeck [1,2], Hemmati and Fang [3], Fujino et.al. [4], and others has demonstrated 
that 2-dimensional (one signal/time unit) rate 2/3 (3/4) trellis coded 8PSK (16PSK) modulation 
is capable of achievilng data rates in excess of 100 Mbps on satellite channels. The promise of 
even higher rates is possible with multi- dimensional trellis coded schemes. For example, with 
2L-dimensional schemes, L > 2, where L signals are transmitted per time unit, speeds of up 
to L times those achievable with 2- dimensional schemes may be possible. This depends on fast 
computational techniques being developed to compute the metric of L successive signals on a 
trellis branch in the Viterbi algorithm. For moderate values of L (L < 4), this seems feasible 
using table look-up methods. 

We have conducted an extensive search for good multi- dimensional trellis codes with 2 < 
L < 4 for both 8PSK and 16PSK modulation. These codes achieve coding gains (over uncoded 
transmission at the same rate) of up to 5.5 dB. In addition, many of the codes are fully transparent 
to discrete phase rotations of the signal set (45° transparency for SPSK and 22.5° transparency 
for 16PSK) through the use of differential encoding. A paper summarizing our work in this 
area has been accepted for publication by the IEEE Transactions on Information Theory and is 
included as Appendix A of this report [5]. 

It is recommended that NASA proceed with the development of one of these codes for 
their high speed satellite transmission schemes of the future. A good choice would be the six- 
dimensional ( L = 3), 16-state, rate 7/8, SPSK code listed in Table 9(b) of the paper. This code 
has a 3.57 dB coding gain compared to uncoded modulation of the same rate and is transparent 
to 90° phase rotations of the signal set. With proper decoder implementation, this code would 
be capable of operating at three times ( L = 3) the speed of a comparable two-dimensional code. 
In terms of current technology, this offers the possibility of reliable transmission at speeds in 
excess of 300 Mbps. 

2) Performance Analysis of Bandwidth Efficient Trellis Coded Modulation Schemes 

Most of the bandwidth efficient trellis code constructions which have been published in the 
literature measure performance with a parameter d 2 j ree , the minimum free squared Euclidean 
distance of the code. This is determined by the two codewords (signal sequences) which are closest 
together in terms of squared Euclidean distance. This parameter determines the asymptotic (high 
signal-to-noise ratio) coding gain 7 of the system through the formula 

7 ='10 log,. %, 
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where d\ is the minimum squared Euclidean distance of an uncoded system with the same rate. 

Unfortunately, 7 or d 2 ree may not give a very accurate picture of relative code performance 
at more moderate signal-to-noise ratios (SNR’s), where most practical systems operate. In 
particular, for SNR’s which result in decoded bit error rates of around 10 -4 — 10 -6 , the asymptotic 
coding gain may be a poor estimate of code performance. This effect, which is also true for 
convolutional codes with binary modulation, seems to be more pronounced for bandwidth efficient 
trellis codes due to increased numbers of nearest neighbors. We have found that in order to 
accurately determine performance for bandwidth efficient trellis codes, it is necessary to find 
not only the minimum free distance but several of the next highest distances. This involves 
considerably more computation than just finding the minimum free distance. 

Another problem with determining the performance of trellis coded modulation schemes is 
that the codes are not linear, due to the non-linear mapping from encoder outputs into signal 
points. This makes the determination of the code distances much more involved than for linear 
codes, since we can no longer assume that the all-zero codeword was transmitted. Indeed, the 
computation of a distance spectrum for a non-linear trellis code must involve an average over all 
possible transmitted codewords. 

The above difficulties notwithstanding, we have been able to develop an efficient algorithm 
for determining the distance spectrum of trellis codes. A paper based on this algorithm has 
been submitted to the IEEE Journal On-Selected Areas in Communications and is included as 
Appendix B of this report [ 6 ]. Using this algorithm, we" can obtain an accurate performance 
estimate for most of the best known trellis coded modulation schemes. 

3) Performance Analysis of Bandwidth Efficient Trellis Codes on Fading Channels. 

In the area of mobile satellite communications, it is necessary to use coding techniques which 
are designed to combat signal fading. For binary coding, this simply involves the use of inter- 
leaving. For bandwidth efficient codes using MPSK modulation, however, it has been shown by 
Hagenauer et.al. [7], Hagenauer and Lutz [ 8 ], and Simon and Divsalar [9] that codes designed 
for the AWGN channel will not perform well on a fading channel, even with interleaving. 

We have derived performance bounds for bandwidth efficient trellis codes on Rayleigh and 
Rician fading channels. These bounds show that two new parameters, the effective length and 
the minimum product distance, are more important than the free distance and the path mullti- 
plicity when designing codes for fading channels. New trellis codes for fading channels with 
SPSK modulation have been constructed, and it is shown that these codes outperform codes of 
the same complexity designed for the AWGN channel. A paper summarizing these results has 
been submitted to the IEEE Journal on Selected Areas in Communications and is included as 
Appendix C of this report [10]. 
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Abstract 

In this paper, multi-dimensional trellis coded MPSK modulation is inves- 
tigated. A 2L-dimensional ( L > 2) MPSK signal set is obtained by forming 
the Cartesian product of L 2-dimensional MPSK signal sets. A systematic 
approach to partitioning multi-D signal sets is used which is based on block 
coding. An encoder system design approach is developed which incorporates 
the design of a differential precoder, a systematic convolutional encoder, and a 
signal set mapper. Multi-dimensional trellis coded 8PSK and 16PSK modula- 
tion schemes are found, for a variety of rates and decoder complexities, many 
of which are fully transparent to discrete phase rotations of the signal set. 
Asymptotic coding gains up to 5.5 dB have been found for these codes. 
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1 Introduction 


Since the publication of the paper by Ungerboeck [1], Trellis Coded Modu- 
lation (TCM) has become a very active research area [2-9]. The basic idea of 
TCM is that by trellis coding onto an expanded signal set (relative to that needed 
for uncoded transmission), both power and bandwidth efficient communication 
can be achieved. 

TCM can be classified into two basic types, the lattice type (e.g., M-PAM, 
M-QASK) and the constant-envelope type (e.g., MPSK). The latter has a lower 
power efficiency compared with the former but is more suitable for band-limited 
satellite channels containing nonlinear amplifiers such as traveling wave tubes 
(TWT). Taylor and Chan [10] and Wilson et. al. [1 1] have studied the performance 
of rate 2/3 TC-8PSK and rate 3/4 TC-16PSK, respectively, for various channel 
bandwidths and TWT operating points. Their results showed that TC-MPSK 
modulation schemes are quite robust under typical channel conditions. 

In any TCM design, partitioning of the signal set into subsets with increasing 
intra-subset minimum distances plays a central rule. It defines the signal mapping 
used by the modulator and provides a tight bound on the minimum free squared 
Euclidean distance (FSED) between code sequences,, allowing an efficient search 
for optimum codes. For lattice-type TCM, Calderbank and Sloane [8] have made 
the important observation that partitioning the signal set into subsets corresponds 
to partitioning a lattice into a sublattice and its cosets. Forney [9] has developed 
a method, called the “squaring construction”, of partitioning higher dimensional 
lattices from a lower dimensional lattice by using a coset code. 

In this paper, we investigate a class of multi-dimensional (multi-D) trellis 
coded MPSK (TC-MPSK) modulation schemes. The 2L— dimensional- (2£— D) 
MPSK signal set is generated by simply repeating an MPSK signal set L times 
(L > 2). Therefore, the 2L — D MPSK signal set is the Cartesian product of L 
2-D MPSK signal sets. Multi-D MPSK signal sets provide us with a number of 
advantages that can’t be found in a 2-D signal set: (i) flexibility in achieving 
higher effective information rates, (ii) better coding gains, (iii) easy construction 
of some codes which are invariant to phase rotations, and (iv) due to their byte 
oriented nature, suitability for use as inner codes in a concatenated coding system 
[ 12 ]. 

In Section 3, we introduce a block coding technique for partitioning a multi- 
D MPSK signal set. We will show that partitioning a 2L — D MPSK signal set 
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is isomorphic to partitioning an Lx log 2 M binary matrix space. This section 
is mathematically rigourous. Thus, a brief description of the major ideas and 
concepts is given in Section 2 by way of an example. Section 4 describes how 
the encoder system, comprising a differential precoder, a systematic convolutional 
encoder, and a multi-D signal set mapper, is constructed from the best codes found 
in a systematic code search. The signal sets are constructed such that the codes 
are transparent to integer multiples of 360°/M rotations of the MPSK signal set. 
The systematic code search is based on maximizing the FSED (and thus the 
asymptotic coding gain) as well as minimizing the number of nearest neighbors 
for each phase transparency. 4-D, 6-D, and 8-D TC-8PSK codes and 4~D and 
6-D TC-16PSK codes are listed with coding gains up to 5.5 dB compared to an 
uncoded system. In addition, these codes require no bandwidth expansion. 
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2 A Block Coding View of Set Partitioning 


In this section we give a description of how partitioning a 2-D 8PSK signal 
set can be viewed in terms of block coding. This relatively simple example is used 
to describe the concepts used to partition multi-D MPSK signal sets in Section 3. 

A naturally mapped 8PSK signal set is illustrated in Figure 1. The reason for 
using natural mapping is that the three mapped bits can be used directly to indicate 
the minimum squared subset distance (MSSD). If each of the three bits y°, y 1 , 
and y 2 is allowed to be 0 or 1, i.e., y J € {0,1} for j = 0, 1 and 2, then there will 
be some combinations (e.g., 000 and 111) where the minimum possible distance 
between two points is achieved. In this case the MSSD will be 2 — \/2 ~ 0.586 if 
the average energy of the signal set is taken to be one. However, if we set y° = 0 
and let y J e {0,1} for j = 1 and 2, then the MSSD of the resulting subset will be 
2. One can view this as y° belonging to a simple length one block code that has 
only one code word, i.e., 0. We say that the Hamming distance of this block code 
is infinity, since that is the distance required to reach all the other (non-existent) 
codewords. This length one block code concept can also be applied to the other 
two mapping bits y 1 and y 2 . These can be thought of as uncoded cases where 
there are two code words, 0 and 1, and where the Hamming distance is one. 

One may ask “What is the use of this block code description, when we 
have the much simpler description given by Ungerboeck [1]?”. As will be seen, 
although this is a complicated description for the simple 2-D case, for higher 
dimensions this description yields a powerful and easy method of partitioning a 
multi-D signal set. 

Now that we have described the signal set in terms of block codes, albeit they 
are trivial, we can use the equation for MSSD given by Sayegh [13] from work 
originally done by Cusack [14], Before we give this equation, some notation is 
needed. Let d be the Hamming distance for the block codes corresponding to 
the bits y- 7 for j = 0, 1, and 2. Also let the MSSD that corresponds to setting 
y° , . . . , y- 7-1 to 0 be S 2 for j =0, 1, and 2. We have already determined that 
6 q = 0.586 and S\ = 2. For the remaining MSSD, it can easily be shown that 
6o = 4. The 8PSK signal set can be seen to have three levels of partitioning. 
A parameter p is used to designate the partitioning level. The initial level of 
partitioning is denoted by p — 0. This corresponds to all eight 8PSK signal 
points. The first level of partitioning {p = 1), corresponds to a subset of four 
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points. This can be continued until we reach the final level with p = 3 and only 
a single point. From [16], the MSSD at partition level p is 

A 2 > min^df ,6q(Iq). 

Due to the symmetry of the 8PSK signal set, the lower bound is an equality. For 
p = 0, = df — (1q — 1, and thus A l = 0.586. For p = 1, dff = oo, and 

thus A 2 = 2. Similarly A 2 = 4 and A3 = oo. Note that in this case A 2 = <S 2 
However, for multi-D schemes, more complicated block codes are used where dj 
can have values of 2, 3, or more. Thus, in some cases, A 2 / <5 2 . 

Note that the above example considers only a single branch of a partition. 
Going back to our original description, we can also set y° = 1, y l € {0,1}, and 
y 2 6 {0,1}. Due to the symmetry of the 8PSK signal set the subset selected also 
has an MSSD of 8\. A block code view of this subset is that it is the coset of the 
subset selected by y° =0. The reason is that we can take the coset representative 
(which is 1) of the coset {1} corresponding to the simple block code {0} and 
add it modulo-2 to y° = 0, which selects the first subset, to obtain y° = 1, which 
selects the other subset. This coset representative is a codeword at the previous 
partition level, but is not a codeword at the current partition level. That is, the 
coset representative (1) belongs to the code {0,1} at p = 0 but not to the code 
{0} at p = 1. In a similar manner, codes corresponding to y 1 and y 2 can be 
partitioned using coset representatives until all 8 signal points belong to subsets 
containing only a single point. This is an important concept, since a multi-D 
signal set partition can be directly described by its cosets, right down to a single 
point, as will be shown in Section 3. - - 

To obtain a multi-D signal set, one can view y J as containing more than one 
bit. In fact, y ] becomes a vector v J that corresponds to the jth bits of two or more 
signal sets. This vector v J contains only one bit for a 2-D signal set, and thus the 
block codes have length one. However, for a 2L-D signal set, there are L bits in 
the vector \ 3 , which will belong either to a block code of length L or to one of its 
cosets, depending on which partition path is chosen. If there are M = 2 1 signals 
in each 2-D signal set, then there will be I = log 2 M sets of these codewords. 
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3 Multi-D MPSK Signal Set Partitioning 

We begin this section with a discussion of partitioning a binary matrix space. 
We then show that partitioning a 2L—D MPSK signal space is isomorphic to 
partitioning an I = log 2 M x L binary matrix space. 

3.1 Partitioning a binary matrix space 

Let C m , with m = 0, 1, . . . , L, be a sequence of ( L , L-m ) binary linear block 
codes with generator matrices G m and Hamming distances d m such that Cl C 
Cl-i C •••Ci C Co. Denote the L - D binary vector space by = {0,1}^. 
Then Co = V^, and Co/Ci/ • • • /Cl~\/Cl forms a 2/2/.../2/2 ( L times)-way 
binary vector space partition chain. The 2 m — way binary vector space partition, 
Co/C m , divides Co into C m and its 2 m - 1 cosets , C m (l), C m (2), . . . ,C m (2 m — 
1). Let t m (u) be the coset representative of C m (u), where u is the integer 
representation of the binary vector u = . . . , u 1 , it 0 ], i.e., u = 2 m ~ x u m ~ 1 + 

• • • + 2 1 u 1 + 2°it°. Then C m (it) and C m are related by 

C m (w) == Cm © t m (it). (1) 


where © indicates modulo-2 addition. 

The coset representative of C m is the L — D all-zero vector and is denoted by 
t m (0). Let r m = t m (2 m_1 ) be the coset representative such that 

lm £ C m _ i , T m ^ Cm- 

We call r m a principle coset representative, since these are the particular coset 
representatives which can be used to fully describe all the cosets. The mapping 
that we assume is that the first m - 1 bits of u are all 0 so that u m ~ l = 0 
selects C m and u m ~ x = 1 selects the coset C m (2 m ~ 1 ). Note that j_ m can be any 
codeword in C m (2 m-1 ). Thus an expression for any coset representative is 

m — 1 

t m («) = U m ~ 1 T m © - • • © U X T 2 © U°T l = 0 U J T j+l . (2) 

- j = 0 
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Example 3.1: 

For the 2-D binary vector space V2={0,1} 2 , we may form the following block 
codes: 


C 0 : (2,2) code, G 0 = 


1 

0 


0 

1 


d 0 = 1; 


Ci : (2, 1) code, Gi = [ 1 1 ] , d\ = 2; 

C 2 : (2,0) code, G 2 = [0 0] , c /2 = 00 . 

Note that C 2 C Ci C Co- The 2/2-way partition chain C 0 /C 1 /C 2 , along with 
the 2-way partitions C 0 /C 1 and C 1 /C 2 and the 4-way partition C 0 /C 2 , is shown 
in Figure 2. The principle coset representatives are Z\ = [0 1] T and r 2 = [1 1] T . 

We now describe how I of the above block codes can be used to describe an 
L x I binary matrix space. Let C mi be an (L,L — m t ) linear block code that is a 
subspace of V^, m,- = 0, 1, . . . , L. Define Q, p = fl(C m/ _, , . . . , C mi , C mo ), where 
p = XljTo 1 m » t ^ e level of partitioning, as the set of all L x I binary matrices: 




u = [v 


1 


V 1 V°1 = 


r - 1 
0 

,,/— 1 


J-i 

} L - 1 


V 

V 


0 

0 

0 

1 


V 


l‘ 

L - 1 


V 


0 

L - 1 


(3) 


where v* e C m<) i = 0, 1, . . . , I - 1, and each v* is an L-dimensional column 
vector. O p is a subspace of = Q(V£, . . . , V^, V^) and is a group under binary 
modulo-2 matrix addition. Q, p is called the principle subset of f2°. f 2°/fZ p is a 
2 P way binary matrix space partition which divides f 1° into fi p and its 2 P — 1 
cosets, fi p ( 2 ) = ft(C mr-i( z )i ■ ■ ■ :C mo (z)), 1 < z < 2 P — 1 where 2 is the integer 
representation of the binary vector z = [ 2 p-1 , . . . , 2 1 , 2 0 ]. C m ,( 2 ) is either C mi or 
a coset of C mi , depending on the partition level p and the particular value of 2 . 
The coset representative of Sl p (z) is given by u p (z) = w(t m/ _ 1 ( 2 ), . . . ,t mo ( 2 )), 
where t mi ( 2 ) is the coset representative of (^( 2 ). The principle subset and its 
cosets are related by 


ft p (2) = ft p ©u; p (2). (4) 

VL V is said to be a subspace of O p if and only if p > p' and C mi C C m ;, i = 
0,1,...,/ — 1. In this case Q v partitions fi p and forms a 2 P-P way binary 
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matrix space partition, and fi°/ft p 7ft p forms a 2 P /2 P p -way binary matrix 
space partition chain. 


Example 3.2: 

Let Co = V 2 , Ci, and C 2 be the (2,2), (2,1), and (2,0) binary block codes 
defined in Example 3.1. Table 1 illustrates a partition chain for 7 = 3. Thus 
0° is a 2 x 3 binary matrix space, and . . . , f2 6 are all principle subsets 

of 0°. Moreover, fl° D fi 1 3 fi 2 D O 3 D fi 4 3 (l 5 D ft 6 . Therefore, 
f} 0 /^ 1 /^ 2 /^ 3 /^ 4 /^ 5 /^ 6 forms a 2/2/2/2/2/2-way binary matrix space partition 
chain. The first three levels of this partition chain are shown schematically in 
Figure 3. When C m .( 2 ), z > 0, is the same as C mi , then C mi is given. The 
determination of the coset representatives can be found using a technique similar 
to that described in (2) at the beginning of this subsection, that is, by the use of 
principle coset representatives. However, it is not always necessary to use linear 
or modulo-2 arithmetic, as will be shown in Section 4, where non-linear arithmetic 
is used. This allows the binary matrix space to have special properties that will 
be described later. The coset representatives for Figure 3 can be determined 
easily from the partition chains. For example, the coset representatives of Q, 3 
are given by 


Y 

rr\ 0 

'o' 

1 

© z 

1 


UY) - zl 

t mA z ) = (Y © z° ■ z l ) 

= Q > 


0 

1 


where m 0 = 2, m\ = 1, m 2 = 0 and z° • z 1 indicates the logical AND of z° 
and z 1 . Note the differences between the above equations and (2). For t mi (z) it 
can be seen that z 2 selects rj (along with a non-linear term), since we are now 
partitioning Co for i = 1. It should be noted that there are many other partition 
chains of the 2 x 3 binary matrix space. 


3.2 Partitioning a 2L-D MPSK signal space 

The 2-D MPSK signal set, denoted by S 2 (M), is the set of complex 
roots of unity, that is, S^M) = {q (*/ m ) 8 ^( 3ir / M ) d , . . . , Q {{'iM-\)nlM)d^ w h ere 
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9 = and M — 2 1 for any positive integer I. S 2 (M) is a group under com- 
plex multiplication. For simplicity, we write S 2 (M) = {y : y = 0, 1, . . . ,M — 1}, 
where y is the integer representation of the binary number y = [y /_1 , . . . , y 1 , y 0 ]. 
This binary or natural mapping of the signal points in S 2 (M) is assumed through- 
out the paper. The 2L—D, L > 2, MPSK signal set is defined as the Cartesian 
product of L 2-D MPSK signal sets, that is, 

S 2L (M ) = S 2 (M) x S 2 (M) x • ■ - x S 2 (M) (L times). (5) 


Therefore, the 2L — D MPSK signal set is generated simply by repeating an MPSK 
signal set L times. 

Letting y /, / = 0, 1, . . . , L — 1 be a sequence of L signal points in S 2 i(M), 
we now form an L x I matrix 


Y = 

1 

t— ‘ O 


1 

» — , 

1 1 

••• Vo 

■■■ y\ 

Vo 

y°\ 

- 

-yL-i. 


v 1 '- 1 

L2/z,-i 

••• vL 

y°L- 1. 


where the rwo vectors yi represent points in 2-D space and Y represents a point 
in 2L-D space. ' 

Using the notation introduced in the last subsection, the Lx 7 matrix subspace 
consists of the L x I matrices u defined in (3). Using (6) and (3), a 2L-D 
MPSK signal subset, denoted by P p = P(C m ,_j , . . . , C mo ) is obtained from Q p 
by the following mapping: 

Y=u>, (7) 

i.e., y\ = v], i = 0, 1, 1 and / = 0, 1, . . . , L - 1. Since the matrix u; 
contains L rows, it is mapped into a 2L-D signal point in S 2 l{M), with the first 
row corresponding to the first two dimensions and the last row corresponding to 
the last two dimensions of the signal point, or equivalently, u is the binary repre- 
sentation of a signal point in S 2 i(M). Moreover, since Q, p contains 2 1L ~ P Lx I 
matrices, the signal subset P p contains 2 IL ~ P 2L-D signal points. Therefore , O p 
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and P p are isomorphic. The minimum squared Euclidean subset distance (MSSD 
or A p) of P p is given by [13] 

A p > min(^ 7 _ 1 <i m/ _ 1 , . . . , 8^d mi , b^dm^, (8a) 

where 

6? = 2-2cos . * = 0,1,...,/- 1, (8b) 

is the MSSD of S2(M/2') (recall that M = 2 7 ) and d mi is the minimum Hamming 
distance of C m ,, i = 0, 1, 1. Due to the symmetry of MPSK signal sets, 
the inequality in (8a) becomes an equality, and thus for 8PSK and 16PSK (/ = 
3 and 4, respectively), (8a) and (8b) lead to 

A“ = min(4<i m2 , 2<f m , , 0.5858d mo ), (9a) 

and - - 

Ap = min(4</ m3 , 2d m2 , 0.5858d m , , 0.1522d mo ), (9b) 

respectively. The mapping in (7) is used in [13-16] to construct block codes in 
signal space. 

The signal set corresponding to fi p is just the 2L—D MPSK signal set 
S'>l(M). It is easy to see that P p is a subset of S 2 l(M), provided that Q p is a 
subspace of ft 0 . P p is called the principle subset of Sol(M) and is a group under 
complex multiplication. Hence, S 2 i(M)/P p is a 2 p -way partition which divides 
5 2j t(M) into P p and its 2 P - 1 cosets P p (z) = P(C m/ _,(z),. . . ,C mo (z)), 1 < 
z < 2 P - 1. The signal cosets can be obtained from the corresponding matrix 
cosets through the mapping in (7). 

Example 3.3: 

Using the mapping in (7), a 4-D 8PSK signal set partition chain, based on the 
2x3 binary matrix space partition chain of Example 3.2, is shown in Table 2. 
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The first three levels of the partition chain are shown schematically in Figure 4. 
The MSSD’s are obtained from (9a). The partition chain in Figure 4 has special 
properties in relation to phase rotations which can be found from the principle 
coset representatives. The derivation of these properties will be explained in 
Section 4. 


Example 3.4: 

This example illustrates how to partition the 6-D 8PSK signal set. In the 3-D 
binary vector space Co = V 3 = (0,1 } 3 , there exists a (3,2) code and a (3,1) 
code with Hamming distances 2 and 3, respectively. However, the (3,1) code is 
not a subcode of the (3,2) code. Consequently, three different 2/2/2-way binary 
matrix space partition chains are possible: Co/Cj/C^/C 3, Co/Cf/C 2 /C3, and 
Co/Cf/Co/Ca, where C3 is the (3,0) code and 


C} : (3,2) code, Gj = 

C* : (3,1) code, G* =[0 
C 2 : (3,2) code, G] = 

C 2 : (3,1) code, G| = [l 


0 1 
1 1 


1 0 0 
1 1 1 


,<*1 = 2 , 
1], 4 = 2, 

, 4 = 1, 

1 ] » 4 = 3- 


- Note that CjCC'cCjc C 0 , C 3 C C 2 C C 2 C C 0 , and CjCCjcCjc C 0 . 
A variety of 6-D 8PSK signal set partition chains can be constructed based on 
these three 3x3 binary matrix space partition chains. Two 6-D 8PSK signal set 
partition chains obtained by the mapping in (7) are given in Tables 3(a) and 3(b). 

An 8-D 8PSK signal set partition chain is given in Table 4. Before leaving 
this section, we give one more example to show how to partition multi-D 16PSK 
signal sets. 


Example 3.5: 

In this example, we partition the 4-D 16PSK signal set. Let Co, Ci, and C 2 
be the (2,2), (2,1) and (2,0) binary block codes defined in Example 3.1. For 
I = log 2 16 = 4, Table 5 illustrates the partition chain that is used. Then 
is a 2 x 4 binary matrix space, and fi 1 ,^ 2 ,...,^ 8 are all principle subsets 
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of fi°. Moreover, fi° D fl 1 D ■■■ D fi 8 . Therefore Cl° /Q 1 / - • • /Q. 8 forms a 
2/2/2/2/2/2/2/2-way binary matrix space partition chain. The first three levels of 
the partition chain are shown in Figure 5. The MSSD’s are found from (9b). 

Three 6 -D 16PSK signal set partition chains are listed in Tables 6 (a)- 6 (c), 
respectively. The corresponding binary matrix space partition chains can be read 
from these tables. 

From the above discussion, we observe that various partitions can be con- 
structed for a given multi-D MPSK signal set, and this establishes the basis for 
constructing good codes. It should be pointed out that Forney’s [9] squaring con- 
struction and 3-construction can also be applied to partitioning multi-D MPSK 
signal sets. The resulting partitions, however, may be inferior to the partitions 
introduced above. For example, in partitioning the 8 -D 8 PSK signal set using 
the squaring construction (or 4-construction), A 4 =2 instead of 2.343 as shown 
in Table 4. 
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4 Multi-D TC-MPSK Design 

This section describes how convolutional codes are constructed for the 2L— D 
MPSK signal sets described previously. We first describe how to construct signal 
sets which have good phase rotation properties. Following this, the method used 
to find good convolutional codes based on parity check equations is presented. 

4.1 Construction of signal sets 

In the previous section, a signal set was described in terms of the principle 
subset Q p and its cosets Q p (z), 0 < z < 2 P - 1. In Section 3.1, it was shown that 
for I = 1, cosets can be constructed by using the principle coset representatives 
r m . For I > 1 we can use a similar technique, where the principle coset 
representative r p at partition level p ~ gi yen by 

r p = u/{2 p - 1 ). 

If m, retains the same value going from partition level p — 1 to p, then 
tm,(2 p_1 ) equals the all zero vector 0 = [0...0] T . This can be seen in Figures 3 to 
5i where at partition level p and z = 2 P_ \ only those m,’s that increase from p-1 
to p have any effect on the coset. There is no principle coset representative for 
p = 0, since the binary matrix space fi° has no cosets. Also note that r p € O p_1 
and T p gQ p . 

Example 4.1 

For the partition chain in Table 2, the principle coset representatives r p for the 
4-D 8PSK signal space are 

X 1 = [t 0 (l),t 0 (l),ti(l)] = 

r 2 = [to(2),to(2),t 2 (2)] = [0,0,r 2 ], 

r 3 = [t 0 (4) ! t 1 (4),t 2 (4)] = [OjTjjO], 

I. 4 = [to(8),t 2 (8),t 2 (8)j = [0,r 2 ,0], 
r 5 = [t 1 (16),t 2 (16),t 2 (16)] = [r 1 ,0,0], 
r 6 = [t 2 (32), t 2 (32), t 2 (32)] = [r 2 ,0,0], 

where Z\ = [0 1] T , r 2 = [1 1] T , and 0 is the all zero vector [0 0] T . 
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As in (2), we can find the coset representatives at any partition level by the 
modulo-2 addition of the respective principle coset representatives, i.e., 

p - 1 

u p (z) = ©*V +1 ; 0 < 2 < 2 P - 1. (10) 

j = o 

An alternative and more useful way of forming the cosets is as follows. An 
Lx 1 M— ary vector space to can be formed such that 


/-i 

« = E 2 ' v '- (ii) 

i=0 

where modulo-M arithmetic is used and I is the number of non-zero values of 
mi at partition level p. Then the principle coset representatives can be expressed 
in integer form as 


/-i 

r p = £2‘M2'- 1 ) = 

i=0 


'L - 1 


( 12 ) 


where modulo-M = 2 l arithmetic is used. Note that t p is an L x 1 vector and 
that its elements belong to the set 0,1, . . . , M — 1. The coset representatives at 
partition level p are then 


p-i 


to 


'(*) = £ z j r j+l ; 0 <z< 2 P — 1, 

j = o 


(13) 


where modulo-M arithmetic is used. 


Example 4,2: 

For the principle coset representatives found in Example 4.1, the principle coset 
representatives in integer from for 4-D 8PSK are found from (12) as 
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To find the coset representative (in integer form) at partiton level p = 3 and 
for z = 3, we see from Table 2 that 1 = 2, and hence from (13) 


3) = z 2 r 3 + z l r 2 + z°r 1 = 0 


0 

2 


+ 1 


+ 1 


0 

1 


1 

2 


where modulo-4 arithmetic is used. 


In a practical implementation of an encoder, a single point in multi-D space, 
given a value of z, can be found by partitioning down to level p = IL. At this 
partition level the coset representatives themselves are the actual points in signal 
space. We call this full partitioning. Let y(z ) represent each 2L—D MPSK point 
Y in integer form, i.e. 


y( z ) = 


y o 
y\ 

L 2/1-1 J 


/-i 


, where yi = Y 2 ‘ 2 /!> 1 = °» •••> L ~ 1 - 


(14) 


1=0 


The variable z is used in y(z) since each point in Y can now be described in 
terms of z. Thus, with full partitioning, we obtain (for p = IL) 


IL - 1 

y(z) = lu il {z) = ^ z j r j+1 ] 0<z<2 IL -1, (15) 

j=o 

where addition is modulo-M. 

Equation (15) can now be used to describe a signal point in 2L — D space with 
MPSK modulation. The number of bits z J used to describe a signal point is IL. 
If the least significant bit (lsb) is used for coding, we can form a rate ( IL - 1 )/ IL 
code. Other rates can also be formed by letting the q lsb’s of the mapping be set 
to 0. We do this to insure that the MSSD’s are as large as possible, and thus the 
best codes can be found. Therefore we let 


IL - 1 

y\z) = Y zi ~ g r j+1 ; 0 < z < 2 IL ~ q - 1, 0 < q < L - 1, (16) 

3=1 
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where y q (z) represents a point z in 2L-D MPSK signal space such that the first 
q bits of (15) are 0 and addition is modulo-M. Now z = . . . , z 1 , z°], 

where the lsb of z is always the coding bit. This insures that the parity check 
equations can always be expressed in terms of 2 without depending on the 
type and partition level of the signal set used. From (16), codes of rates 
( IL — q ~ 1 )/(lX — q) can be formed. An upper limit of q = L — 1 is set 
because for q > L the signal set is partitioned such that d mo = 00, i.e., an 
M/ 2* - PSK, j > 1 , signal set is being used (one exception is the 8 -D 8 PSK 
signal set (Table 4) where d mo = 00 for q > L + 1). The MSSD’s range from A^ 
to A? £ and the uncoded minimum squared Euclidean distance (SED) is A^ +1 . 


Example 4.3: 

We can form a rate 4/5 code with 4-D 8 PSK modulation (9 = 1 , L = 2, 1 = 3). 
Then 


y 1 0) = 


+ Z J 


+ Z* 


+ Z J 




where addition is modulo- 8 .The uncoded minimum SED is A, = 2.0, which is 
the same as uncoded QPSK. 


4.2 Effect of a 360°/M phase rotation on a Multi-D MPSK signal set 
The reason for constructing the signal set as in (16) is that there are at most I 
bits in 2 affected by a signal set rotation of <]> = 360° /M. For 8 PSK and 16PSK, 
this corresponds to rotations of 45° and 22.5°, respectively. Initially, we consider 
all possible mapped bits, and thus q = 0. 

Consider that a 2-D MPSK signal set has been rotated by <I>. Since we 
are using natural mapping, the integer representation of the rotated signal point 
is y T = y + 1 , where y is the integer representation of the signal point before 
rotation and modulo-M addition is used. If binary notation is used, then 

y° T = y° © l, 
y\ = y l ffi v\ 
vl = y 2 Sy°-y‘, 
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(17a) 

(17b) 

(17c) 


If there are I = log 2 M bits in a signal set, then all I bits are affected by a 
phase rotation of 

We now consider the first partition of a multi-D MPSK signal set. z° is used 
to select one of two partitions, P(Co, . . . , Co, Ci) or P(Co, . . . , Co, Ci(l)). We 
know from (17a) that the lsb’s are inverted by a $ phase rotation. Then, if all the 
code words in Ci remain code words in Ci when inverted, then z° will remain 
the same after a phase rotation. That is, if Ci = C lt then 2 ° = z°. However, if 
Ci = Ci(l), then 2 ° = z° ® 1, as can be seen from the set partition. A simple 
way to tell if a block code has the property that C mj = C m ,, or if C m , equals one 
of its cosets, is to examine its coset representative at that partition level. Assume 
Z m . = [l...l] T = 1. Since r m . e C mi _i, then C mi _j = C mi ._! foUows from 
code linearity (the inverse of r m = 1 is the all zero vector 0). However, we also 
have r m . ^ C m<) and thus the inverse of 0 and all the other vectors in C m , form 
a coset of C m; (again for linear codes). Thus, if r m . ^ 1 at partition level p then 
z v T ~ x = 2 P ~ 1 ; otherwise, 2 ? _1 ^ z p ~ l . 

For C mo , we can always say that if r mo = 1 at partition level p, then z?~ l = 
2 P_1 ® 1; otherwise, z? -1 = 2 P ~ 1 , since the additions for y°, l = 0, 1, . . . , L — 1, 
are modulo-2 using either (10) or (16) to map a signal point. However, for i > 1, 
(10) gives signal sets which have IL-I-l bits affected by a phase rotation. This 
is because an inverted z p which affects C mo will cause some signal points to rotate 
in different directions. However, using the mapping in (16), all the signal points 
will rotate in the same direction, since modulo -M arithmetic is used. Thus, using 
the mapping in (16), 

2 p ° — 1 © 1 , 

2 Pl-1 ®2 P0_1 , 

2 P2_1 © 2 P0_1 -Z Pl ~\ 


r P0-l _ 

3.-1 : 


,P2~1 — 


where the p/s, 0 < i < I - 1, correspond to the partition levels where r m . = 1, 
and for all other partition levels, z?~ l = 2 P_1 . That is, the p,’s indicate which 
bits are affected by a phase rotation of ’J'. 

Example 4.4: 

Consider the 4-D 8PSK signal set with a rate 5/6 encoder. By examining Table 2, 
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fcf 


we see that pa = 2, p\= 4, and p 2 = 6 correspond to the partition levels where 
r m . = 1 = [1 I] 71 . Thus the effect of a 45° phase rotation on the signal set is 


z\ = z 1 © 1 
z 2 = z 2 

z? = z 3 e z 1 

4 4 

Z* = 2 

z\ = 2 5 © 2 1 • 2 3 . 


(18) 


The phase invariance of the mapping used for the 4-D 8PSK signal set can 
be checked as follows. From (14) and (15) the signal outputs can be described 
in terms of 2 as 


Vo 
y 1 


4yo + 2 vl + yo 
L 4y 1 .+ 2yJ + y 1 J 


= 2 ” 


+ 2' 


+ 2 J 


+ 2' 


+ 2 1 


+ 2“ 


0 

1 


(42 5 + 2z 3 + 2 1 ) 


+ (42 4 + 22 2 + z°) 


0 

1 


where all additions are modulo-8. After a 45° phase rotation, we have y/ r = 
yi + 1, for l = 0, 1. Thus from above we can form the following phase rotation 
equations, 


V0,r 

Vl,T 


= (4z 5 + 2z 3 + 2 1 + 1) 


1 

1 


+ (42 4 + 2 2 2 + 2°) 



Note that a 1 is added to the term whose coset is [1 1] T . Hence this term “absorbs” 
the affect of the phase rotation, leaving the remaining term unaffected. Thus from 
(17), we can form the phase rotation equations given in (18). Had the signal set 
been constructed using (10), only z° would have remained unchanged by a 45° 
phase rotation. 
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We have shown that for q = 0, the bits that are affected by a phase rotation 
of 'I' are z Pl ~ l , 0 < j <1 — 1. For q > 0 the bits that are affected are 
z pj-q-i, 0 < j <1 — 1. However, depending on the signal set, pj — q — 1 for 
some j may be less than zero. If this is true, the minimum phase transparency 
will be 2 d( I>, where d is the number of terms pj — q — 1 that are less than zero, 
and the number of bits (3) that are affected by a 2 d< f! phase rotation is $ = I — d. 
For example, the 6-D 8PSK signal set in Table 3(a) has po = 1, p\ = 4, and 
P 2 = 7. Thus if q = 1, then po — q — 1 = -1, which is less than zero, implying 
that d = 1, and thus there will be only s = I - d = 2 bits affected by a 2'I> = 90° 
phase rotation. Note that a phase rotation of = 45° of this signal set will 
produce its coset. 

Fortunately, for the codes and signal sets considered in this paper, the above 
complication does not occur. This is partly due to the fact that for many signal 
sets with q ~ 0, the L — 1 lsb’s are not affected by a phase rotation of '!>. Since 
we consider only signal sets with 0 < q < L — lin this paper, d = 0. For those 
signal sets where this is not true (e.g., in some 6-D signal sets), it has been found 
that the convolutional codes produced are inferior (in either minimum FSED or 
number of nearest neighbors) to an alternative signal set with d = 0. Therefore, 
we will not consider the above effect further. 

When a signal set is combined with a convolutional encoder, we must consider 
the effect of rotating coded sequences. A similar result is obtained as above in 
that, depending on the code and the signal set, the signal set can be rotated in 
multiples of 2 d< i> and still produce valid code sequences. We define d to be the 
degree of transparency. The actual determination of d is described in section 4.4. 
Also, the number of bits (s) that are affected by a phase rotation is s = I — d. 

For 0 < q < L - 1, the actual bits that are affected by a phase rotation of '3> 
are z bj , where bj — p) — q — 1, 0<j <1-1. More generally, the bits that are 
affected by a phase rotation of 2^ are z Cj , where cj = pj +( i~q -l,0<j<s-l. 
These two seperate notations (6 ; and c 3 ) are used because the determination of 
d depends on bj. 
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4.3 The general encoder system 

From the above information we can now construct a suitable encoder which 
is illustrated in Figure 6. The general multi-D encoder system consists of five 
sections. These sections are the differential precoder, the binary convolutional 
encoder, the multi-D signal mapper, the parallel to serial converter, and the 2-D 
signal mapper. In this paper the convolutional encoder is assumed to be systematic 
with feedback as in [1]. That is, z*(D) = x*(D), 1 < j < k, where D is the 
delay operator and polynomial notation is used. The parity sequence, z°(D ) will 
be some function of itself and the x J (D), 1 < j < k. The parity check equation 
of an encoder describes the relationship in time of the encoded bit streams. It 
is a very useful and efficient means of describing a convolutional code, since it 
is independent of the input/output encoder relationships. For an R = k/(k + 1) 
code, the parity check equation is 


H k (D)z k {D) © • • • © H\D)z\D) © H°(D)z°(D) = 0(D), 1 < k < k, (19) 


where k is the number of input sequences that are checked by the encoder, 
H*(D), 0 < j < k, is the parity check polynomial of z j (D), 0(D) is the 
all zero sequence. 

• Since the encoder is systematic, the differential precoder only preconditions 
those bits which are affected by a phase rotation, i.e., the input bits into the 
encoder which need to be preconditioned are w Co , w Cl , . . . , w c, ~ l . If co = 0, we 
replace w° (which does not exist) by z°, as shown in Figure 6 by the dashed 
line. For example, an encoder for a rate 8/9 code which uses the 6-D (partition I) 
8PSK signal set given in Table 6(a) may (depending on the phase transparency) 
need this modification. This is because this signal set has bo = 0, and thus if the 
code has d — 0, then z° will need to be precoded. Figure 7 illustrates the two 
types of precoders. Note that the storage elements have a delay of L T, where T 
is the symbol period in time of each 2-D signal point that is transmitted by the 
2-D signal mapper. Figure 7(a) illustrates the precoder with co > 0, where there 
are 5 inputs that need to be precoded. The basic component of the precoder is the 
modulo-2 a binary adder. For most codes this is the precoder to be used. Figure 
7(b) gives the other case where co = 0 and 5 — 1 input bits are precoded (the 
other precoded bit being 2 °). For the bits that are not precoded, x‘ = w', i ^ cj. 
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At this point, we summarize the notation and indicate the limits on the 
parameters used in the search for good codes. For a rate ( IL — q — 1 )/(iX — q) 
code, 

I = no. of bits in each 2-D signal (3 < I < 4), 

M = 2 1 — no. of signal points in each 2-D signal set, 

L s= no. of 2-D signal sets (2 < L < 4 for 8PSK and 2 < L < 3 for 16PSK) 
p = partition level of signal set (0 < p < IL), 
q = the partition level p where mapping begins (0 < q < L — 1), 
z = signal set mapping parameter (0 < z < 2 p ~ q — 1), 
k = IL - q — 1 = no. of input bits to encoder, 

$ = 360 °/M = minimum phase transparency with q = 0, 
d =s degree of phase transparency (2 d '&, 0 < d < I), 
s =s I — d = no. of bits in z affected by a 2 d 'Z> phase rotation (0 < 5 < I), 
cj = pj+d — q — 1 = the bits z Cj affected by a 2 d< & phase rotation (0 < j < s — 1). 


There are two types of systematic convolutional encoders that can be con- 
structed. Before proceeding with the description of these encoders, we return 
to the parity check equation given in (19). As in [1], we define v to be the 
maximum degree of all the parity check polynomials H 3 {D), 0 < j < k. For 
k < j < k, H J (D) = 0, since the bits corresponding to these polynomials are not 
checked by the encoder. If k < v, the parity check polynomials are of the form 

H\D) = 0 © K_ x D v ~ x © • • • ® h\D 0 0, I < j < k, _ (20a) 
H\D)= l®h° v _ 1 D v - 1 ' (20b) 

Equations (20) insure that the SED between paths in a trellis leaving or entering 
a state is at least 2A^ +1 . Thus codes can be found that have a FSED or df ree (the 
minimum SED between all possible coded sequences) of at least 2A^ +1 , where 
A“ +1 is the <^ ree of the uncoded comparison system. A theoretical justification for 
constructing codes in this manner has been found in [17] where it is shown, using 
random coding arguments, that these codes have a large FSED on the average. 
A minimal systematic encoder can be implemented from (20), since h [ | = 1 [1]. 
The encoding equations are 
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(21a) 


z j (D) = x ’(D), 1 <j<k, 
z\D ) = H k {D)x l {D) © • • • © H\D)x\D) 

© (H°(D) © l)z°(D). (21b) 


An encoder implementation using (20) is shown in Figure 8(a). 

For all codes with v = 1 and for some codes with v > 1, k = v. For 
these codes we cannot set hi = 0, l<j<k. This is because k checked bits 
require at least k terms in H J (D) , 1 < j < h that are variable. If there are not 
enough variables, then there will be some non-zero \ k = [x k , . . . ,x 2 ,x l ] such 
that (Bj=i = 0, 1 <m<v. That is, there will be more than 2 k k parallel 
transitions between states in the trellis. To avoid this problem, when k = v, we 
let the parity check polynomials be 


H\D) — h 3 v D v © • • • © h\D © 0, l < j <k, (22a ) 

H°(D) = h° v D v © • • • © h\D © 1. ' (22b) 

In (22), ther is always at least one term hi, 1 < j < k, that is equal to one, if 
the number of variables Jc is to be maintained. Thus the degree of the encoder 
remains at v. The is at least A“ -f A 2 +1 , since the minimum incremental SED 

between paths leaving a state is A 2 +1 (since h 3 0 = 0, 1 < j < k, and = 1) 

and between paths entering a state is A 2 (since hi 6 {0,1} for 0 < j < k). The 
encoding equations are given by (21) and an encoder implementation for k = v 
is shown in Figure 8(b). 

The multi-D signal mapper can be implemented by using cosets of the signal 
set, the value of q, and (16). Figure 9 illustrates an implementation of the multi-D 
signal mapper. Note that only modulo-M adders are required to implement the 
signal mapper. The thick lines in Figure 9 represent the / bits for each MPSK 
signal point. Due to the set partitioning, many of the coefficients are equal to 
zero and the non-zero coefficients have only one non-zero bit. Thus, only one 
line is needed to represent each coefficient. 
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The second to last section of the encoder is the parallel to serial converter, 
which takes the L groups of I bits and forms a stream with I bits in each group. 
That is, we are assuming a channel which is limited to transmitting one 2-D 
signal point at a time. A representation of a parallel to serial converter is shown 
in Figure 10. Finally the 2-D signal mapper takes the I bits for each 2-D signal 
point and produces the required real and imaginary (or amplitude and phase) 
components for a modulator. 

Example 4.5 

In this example, we describe how to implement a particular code. The code is 
used with a 6-D 8PSK signal set. Thus L = 3 and 1=3. We also have q = 1, 
so that a rate 7/8 code is formed. The partition that is used is given in Table 
3(b), from which we obtain po = 3, p\ = 4, and p 2 = 7. The code is 90° 
transparent and thus d = 1 and s = 2. Therefore co = pi — q — 1 = 2, and 
ci = p 2 ~ q — 1 = 5. Thus bits w 2 and w 5 are precoded using a modulo-4 adder. 
Since c 0 > 0, the precoder given in Figure 7(a) is used. For this code, k = 2, and 
the parity check polynomials are H°(D) = D 4 ® D 3 © D ® 1, H 1 (D ) = D, and 
H 2 (D) — D 3 ® D 2 . Excluding the parallel to serial converter and the 2-D signal 
mapper, the encoder is shown in Figure 1 1. This code has 16 states (v = 4). Note 
that the multi-D signal mapper does not exactly correspond to Figure 9. This is 
due to the fact that the terms have been collected so as to minimize the number 
of modulo- 8 adders that are required. Also note that bits other than z 1 which are 
tapped L = 3 times are checked by the precoder, since the code is 90° transparent. 


4.4 Convolutional Encoder Effects on Transparency 

As mentioned previously the convolutional encoder can affect the total trans- 
parency of the system. The method used to determine transparency is to ex- 
amine the parity check equation and the bits that are affected by a phase ro- 
tation. A code is transparent if its parity check equation, after substituting 
zi(D) with 4(D), 0 < j < k (the rotated sequences), remains the same. 
There will normally be at most / bits that are affected by a phase rotation, 
z b ° , . . . , z b, - x , bj = pj — q — 1, 0 < j < / — 1. That is, 


z b r ° = z bo © 1, 


(23a) 
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(23b) 

(23c) 


z b r l = z b ' © z b \ 
z \ 2 = z h ©A A 


Assume that 0 < h < k and bj > fc; 1 < j < I — 1. Then only one term in 
the parity check equation is affected by a phase rotation. The other bits have no 
effect since they are not checked by the encoder. The parity check equation after 
a phase rotation of $ becomes 


H k {D)z k {D) © • • • © H b °(D)(z bo (D) © 1(D)) 

© • - • © H°(D)z°(D) = 0(D), 

H k (D)z' k (D) © • • • © H bo (D)z b °(D) 

© • • • © H\D)z\D) = E[H bo (D)\(D), (24) 


where E[H b °(D) ] is the modulo-2 number of non- zero terms in H bo (D ) and 1(D) 
is the all ones sequence. Thus if there is an even number of terms in H bo (D), 
(24) will be the same as (19). That is, the code is transparent to integer multiples 
of '3> phase rotations of the signal set. However, if there is an odd number of 
terms in H b °(D), then E[H ba (D)\ = 1 and the coset of the convolutional code is 
produced. Even though the two equations are closely related, the codes are quite 
different and a decoder will not be able to produce correctly decoded data from 
a phase rotation of the signal set 

Now assume that the first two terms are affected by a phase rotation, i.e., 
0 < bo, bi < k, and bj > k, 2 < j <1 — 1. The terms in the parity check 
polynomial H b °(D)z b °(D ) © H bl (D)z bl (D ) now become 


(H b °(D) © H bl (D))z b °(D) © H b '(D)z bl (D) © E[H b °(D)}(D). 


In this case the parity check equation will be different after a phase rotation. 
This does not mean that the code is not transparent to any multiple of $ phase 
rotations. In fact, the code could be transparent to 2^ or 4'P phase rotations. 
This is because the phase rotation equations reduce to 
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z \ d ~ 1 = z bd - 1 
z bd = z bd ® 1 




© z 


id 


for a 2 < *'E f phase rotation, where d = 1 or 2. If there is an even number of terms 
in H bl (D), then d = 1. This is because the even number of non- zero terms in 
H bl (D) cancels the effect on z b '(D) when the signal set is rotated by 24'. That 
is, the code is transparent to integer multiples of 24> phase rotations and no less. 
If there is an odd number of non-zero terms, this canceling effect can not occur, 
and then d = 2 giving a phase transparency of 4$. 

In general, for 0 < 6 0 , . . . , 6/ < fc, 0 < / < I - 1, d = / + E[H bi (D)]. 
Then we can determine those bits z C] which are affected by a 2 d ty phase rotation, 
i.e., Cj = bj+d = pj +d — q — 1, 0 < j < s — 1, where s = I — d. 

Example 4.6: 

For the code given in Example 4.5, k = 2, I = 3, and q = 1. Thus 
b 0 = 1, b\ = 2, 62 = 5, and 0 < 6 0 ,&i < 2. Therefore / = 1 and 
d = 1 + E[H bl (D)] = 1 + E[D 3 0 D 2 ] = 1. Thus the code is 90° transparent 
and co = 2 and c\ = 5. 

4.5 Systematic search for good small constraint length codes 

For each multi-D signal set considered there are a number of code rates 
for which v can range from one to as large as one wishes. As v is increased a 
comprehensive code search becomes time consuming due to the greater complexity 
of each code. We have thus limited our search to v < 6. The criteria used to find 
the best codes are the FSED (<4^), the number of nearest neighbors (N(d'l ee )) 
and the code transparency ( d ). The code search algorithm that was implemented 
is similar to that in [1], but with a number of differences which include the extra 
criterias mentioned above. 
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The actual code search involves using a rate k/(k + 1) code. Thus two 
seperate notations are used to distinguish the rate k/(k + 1) encoder and the 
simplified rate k/(k + 1) encoder. For the rate k/(k + 1) encoder, we have 
x n = [x*,...,x*] (the input to the encoder) and z„ = [ 2 *,... , (the 
mapped bits or encoder output) at time n. Also, e n = [e*, . . . , e\, e°] is the 
modulo-2 difference between two encoder outputs z n and 2 J . n at time n, i.e., 
e„ = z„ ® z There are 2* +1 combinations of z„, and z^ that give the same 
e n . For the rate k/(k + 1) code, we denote reduced versions of x„, z n , and e n as 
*n = [4, • • • , 41, Zn = [4, • • • , 4l» e n = [4, • • • , 4, 4]> respectively. 

In order to find for a particular code, the Squared Euclidean Weights 
(SEW) u> 2 (e n ) were used. As defined in [1], iy 2 (e„) is the minimum SED between 
all combinations a(z n ) and a(z' n ) such that e n = z„ ® z' n and a(z„) is the actual 
signal point in 2L— D space. This can be defined as 


ty 2 (e n ) = min d 2 [a(z n ), a(z n © e n )], (25) 

all z n 

where <i 2 [a(z„), a(z / n )] is the SED between z n and z! n . One can then use the all 
zero path to find in a code search, i.e., 

4*e = min ^V(e n ), 

n 

where the minimization is over all allowable code sequences with the exception 
of the all-zero sequence. 

Since there are 2 fc+1 values of e n , there are a total of 2 2 * +2 computations 
required to find all the values of ur(e„). Thus, for a rate 11/12 code with 8- 
D 8PSK modulation, there are nearly 17 million computations required. This 
can be reduced by letting = 0 (or 1) in z„ and minimizing (25) over all 
z n = [z£, . . . , 0], as suggested in [1]. This reduces the number of computations 

to 2 2i+1 . In fact it is possible to even further decrease the number of computations. 
It can be shown that the L output bits ^corresponding to cosets t p with the 
largest integer value can be set to zero. This is due in part to the MPSK signals 
being antipodal for these values. Thus the total number computations required 
is 2 2k ~ L+1 . 

In order to reduce the time needed to find dj^,, we note that the trellis is 
equivalent to a rate k/(k + 1) code with 2 k ~ k parallel transitions. There are 2 k+l 
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different sets of these transitions. If the minimum SEW is found for each of 
these sets of parallel transitions, the code search is gready simplified, since a rate 
k/(k + 1) code is all that needs to be searched and k is usually small. Thus, the 
SEW’s required for a rate k/(k + 1) code search are 


ur(ejj) = min u> 2 (e„), 


(26) 


where the minimization is over all [e£, . . . , e* +1 ]. The FSED for this reduced 
code (which we call d 2 r j k ^) can be larger than since this FSED might be 
limited along the parallel transitions by a MSSD of A^ + - +1> i.e., 

4« = ™n(4f ) ,Aj +i+1 ). (27) 

The best value of k can be determined from the FSED of the best code for 

the previous value of v. The search starts with v = 1 and k = 1. Then v is 

• 2 (k) ~ 

increased by one, and if the FSED of the previous best code was then k 

remains the same. This is because the limit of the parallel transitions A 2 - 

g+k+l 

has not yet been reached and the trellis connectivity needs to be reduced in order 
to increase d^. If the FSED of the previous best code was A 2 + - +J , then k is 
increased by one from the previous value; otherwise, the FSED and the number 
of nearest neighbors would remain the same. If d = A 2 + - +i for the previous 

best code, then k can remain the same or increase by one. Both values of k must 
be tried in order to find the best code. 

iV(d£ee) num ber of nearest neighbors between all paths with SED of 
d 2 rec . If d 2 ree = an upper bound on N(d l^) can be found by determining 

the number (A) of paths with weight d 2 { J^ in the equivalent rate k/(k + 1) code. 
Let the binary error sequence which occurs along a path a, with length N a and 

FSED d 2 b i k \ be 


e K (D) = e*D © e^D 2 © • • ■ © e% a D Na , ef,e^ a ^ 0, N a > 1. 


An upper bound on N(d‘l ea ) is 
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A N a 


w(<4*) < E II 


(28) 


a=l n=l 


where m(e£) = #[^ 2 ( e ^) = i« 2 (e n )] is the number of times that tu 2 (e£) = 
w 2 (e n ) over all [e*, . . . , e* +1 ]. That is, we sum the multiplicities of all possible 
minimum weight error events. On the other hand, if = A^ + - +1 , then 


*(<&,) < #[A 2 +l+l = « 2 (e„)J 


(29) 


over all e„ = [e*, . . . ,e‘ + >,0 0). If dj* = A 2 +{+J> then the RHS of (28) 

and (29) are added to determine an upper bound on iV(d^ e ). 

The reason that (28) and (29) are upper bounds is that for some e„ and 
z n , u> 2 (e n ) / d?[a(Zn),a(z n © e n )], due to the definition of ur(e„) in (25). 
This results in average numbers of nearest neighbours which must be determined. 
Equations (28) and (29) assumes the worst case and hence results in an upper 
bound. A precise value of iV(d| ee ) for d| ee = d 2 J e ^ [17] is 


A N 

WLe) = E fT TT ’f C n). 


(30) 


a=l n— 1 


where 


m(ei) = E 


^ #ill r.[«> 2 (4) = ^H^n). a(z„JB e„)]] 


2* 


(31) 


the £ is over all [e*, . . . , e* +1 | for which iy 2 (e*) = w 2 (e n ), and the # is over 
all z„ = [ 2 *, . . . , z^, 0]. That is, ra(e£) is the sum of all the average number of 
nearest neighbors for each signal point in each set of parallel transitions. Note that 
the summation in (31) is upper bounded by m(e*). Similarly, ford^ e = A 2 + ~ +1 , 

S(4J = E ( #a ^ 1A ^- =d 2 2 t KZ ” ) ' a(Z " ee ’ ))r | , (32) 


where the £ in (32) is over all [e£, . . . , e£ +1 , 0, . . . , 0] for which A 


g+fc+l 
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w 2 (e n ). If = A 2 - it then /V(<f£ ee ) is the sum of the RHS’s of (30) 
and (32). ?+ + 


Example 4.7: 

For the code given in Example 4.5 we have it = 2 for a rate 7/8 code with a 6-D 
8PSK type II signal set. After determining the mapping of the signal set, (25) 
can be used to find the SEW’s for each signal point. Equation (26) determines 
the u> 2 (e*)’s that are to be used to find the best rate 2/3 codes. For this signal set 
A 2 r = A \ — 4.0. That is, A 2 r = 4.0 is the minimum SED that occurs 
between parallel transitions. Using (29), we can determine an upper bound of 19 
on A^(A 2 + - +i ). In the code search for the best rate 2/3 code, there may be many 

codes which have the largest of 4.343. Thus (28) was used to determine 
an upper bound on N(d for each best code using an appropriate algorithm 
and m(e£). Table 7 gives for each e£, the values ofw 2 (e£) and m(e*) that were 
used in the code search. The best code with a transparency of 90° was found 
to have N(d^' > ) < 432. Thus dj T&e = 4.0 and d 2 ext = 4.343, where ci 2 ext is 
the next smallest SED. 


In order to reduce the number of codes that need to be tested in a code search 
algorithm, rejection rules can be used. As in [1], time reversal of the parity check 
polynomials can be used to reject codes. Since w 2 (e£) and m(e*) are used to 
find the best codes, Rule 2 in [1] cannot be fully exploited. In the code search, 
a rate k/{k + 1) code is used at a particular v. For some of these codes parallel 
transitions can occur. These codes may be rejected before the algorithms are used 
to generate an encoder trellis and find If for some input x* ^ 0 , the inputs 
into the systematic encoder are all zero, then parallel transitions will occur. This 
is because this non-zero input will cause the state of the encoder to go from one 
state to the next as if a zero input had occurred. Thus parallel transitions will 
occur in the rate k/(k + 1) code, which should not have parallel transitions. That 
is, if for some x£ ^ 0, ®^ =1 x 3 n h J = 0, where h - 7 = [h 3 v , h{,h 3 0 ], then the 
code is rejected. Similarly, we can reject codes with parity check polynomials 
h 2 , 1 < l < k, if ®y = i xih- 7 = 0 for some x|j ^ 0 . Rule 3 in [1] can also be 
used to eliminate codes. 
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An approximate lower bound for the symbol error probability [1] is 



where E^/Nq is the energy per information bit to single sided noise density ratio 
and = (IL - q - 1)/L (bit/T) is the average number of information bits per 
2-D signal transmitted. Thus in our code search we attempt to maximize d 2 JCC 
and to minimize N(d‘l ee ). In (33) the average multiplicity of errors is normalized 
to that of a 2-D signal set. 

Two programs were used in the code search, one for codes with v < k and 
the other for codes with v = k. For specific values of I, L, and 9, !/'(*), 0 < 
z < was generated, using the coset representatives r p , 1 < p < IL, 

that are given in Tables 2-6. The squared Euclidean weights u> 2 (e„) were then 
calculated using (25) for all e n . Since the value of k can change with each v, 
w 2 (e k n ) and m(e k ) were computed, if neccesary, as the program went from the 
smallest v to the largest v. 

The code search used the various rejection rules before the time consuming 
tasks such as finding d^^ (using the bi-directional search algorithm [18]) and 
(using a trellis search technique). A variable cf( im was used (as in [1]) 

2 (k) 

to indicate the largest d bxx ’ found at a particular stage in the search. Another 
variable N bm was used to indicate the smallest N(d 2 im ) found during the code 
search with a phase transparency of 2 d< &. 4n and Nfcn were set to zero and 
infinity, respectively, before the code search began. Alternatively, d\ m and N^ m 

could be set equal to the best d^^ and N(d found in a previous search. 
This was the case when one program was used for v = k (since we start with 
v = 1) and then the other program was used for v < k. 

Any code that passed through the code rejection rules based on the parity 

check equation had its d 2 {x ^ computed. If it was less then d 2 iny , this code was 

2 fib) 

rejected and the next code searched. For those codes whose d^J was the same 

or greater than d^, N(d j^) was then computed. Also, from the values of 
Pi, 0 < i < I — 1, for the signal set used, the phase transparency (d) of the code 
was determined. Another stage of rejection was applied to those codes that had 
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^frie^ = tfim- Th° se codes were rejected if N(dj^) > N^. When was 
greater than d^ m , then and Nf im were set to the corresponding values of this 

new code. The code was then listed along with its d^\ N(d^^), and phase 
transparency d. A small list of codes was then produced from which the best 
codes could be chosen. Note that only those codes with the largest d^ were 
accepted regardless of their phase transparency. The advantage of this is that it 
reduces the number of codes to be searched, usually at a cost of rejecting codes 
with a better phase transparency and a smaller but a reduced d^\ 

Since A j; for each of the signal sets is given in Tables 2 to 6, it was a simple 
matter to determine d^ for each code. For those codes where occurred 
along parallel transitions only, dl cxt is also given in the code tables, since this is 
equal to d^\ Note that since m(e£) was used in the code search, the N(df ICC ) 
given in the tables is an upper bound. 

The asymptotic coding gain 7 of each code compared to the uncoded case 
is shown in the tables, i.e.. 


T = 101bg,„(^|p) (dB), (34) 

where d% is the smallest FSED of an equivalent uncoded 2-D or multi-D scheme. 
In nearly all cases ,</jj = A“ +1 , since for codes with a non- integer R^f, no 
equivalent 2-D MPSK code exists which has the same and so the equivalent 
uncoded multi-D signal set is used instead. For the 8-D 8PSK signal set with 
q = 3, Res = 2 bit/T. Thus, a natural comparison would be against uncoded 
QPSK, which has d* = 2. In this case, A^ +1 = 2.343, which would give 
lower coding gains and be inconsistent with other codes that also have Re ff = 2 
bit/T. The asymptotic coding gains compared to uncoded (M/2)PSK are found 
by adding to 7 the appropriate correction factor 

7M /2 = 101og 10 ( L{I k _ t) (dB), (35) 

as shown in the code tables. The transparency (in degrees) is also given for each 
code. An alternative and more abstract comparison is to uncoded 2 i * e(r PSK, as 
suggested by Forney [19]. The correction factor in this case is 
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7 F = 10 log 10 


(dB). 


(37) 


4sin 2 (2 - ^ ff 7r) J 

Codes for rate R = 5/6 and 4/5, 4-D TC-8PSK are listed in Tables 8(a) and 
8(b), respectively. Equivalent R — 5/6, 4-D TC-8PSK codes with up to 16 states 
have been found independendy by LaFanechere and Costello [6] and by Wilson 
[7], although with reduced phase transparency. Rate R = 8/9, 7/8, and 6/7, 6-D 
TC-8PSK codes are given in Tables 9(a), 9(b), and 9(c), respectively. Rate R = 
11/12, 10/11, 9/10, and 8/9, 8-D TC-8PSK codes are listed in Tables 10(a), 10(b), 
10(c), and 10(d), respectively. Rate R = 7/8 and 6/7, 4-D TC-16PSK codes are 
given in Tables 11(a) and 11(b), respectively. Finally, rate R = 11/12 and 10/11, 
6-D TC-16PSK codes are listed in Tables 12(a) and 12(b), respectively. The 
multi-D, 2 state TC-8PSK and TC-16PSK codes were also found by Divsalar and 
Simon [20]. The parity check polynomials are expressed in octal notation in the 
code tables, e.g., H\D) = D 6 + £> 4 + D 2 + D + 1 = (001 010 111) 2 = (127) 8 . 

4.6 Decoder implementation 

When the Viterbi algorithm is used in the decoder implementation, a measure 
of decoding complexity is given by 2 v+k /L. This is the number of distinct 
transitions in the trellis diagram for TCM schemes normalized to a 2-D signal 
set. The maximum bit rate of the decoder is kfd, where fd is the symbol speed of 
the decoder. Since k is quite large for multi-D signal sets (at least (/ — 1)1-), high 
bit rates can be achieved. For example, a Viterbi decoder has been constructed 
for a rate 7/9 periodically time varying trellis code (PTVTC) with v = 4, k — 2, 
and 8PSK modulation [21]. This decoder has fd — 60 MHz and a bit rate of 140 
Mbit/s, where fd equals the 2-D symbol rate. However, with the equivalent rate 
7/8 code with 6-D 8PSK modulation, the bit rate will be L = 3 times as fast, 
i.e., 420 Mbit/s. The branch metric calculator, though, will be more complicated 
due to the larger number of parallel transitions between states. Alternatively, one 
could build a decoder at 20 MHz for the same bit rate of 140 Mbit/s. In addition 
to providing decreased decoder complexity, this multi-D code has an asymptotic 
coding gain which is 0.56 dB greater and is 90° transparent, compared with a 
180° transparency for the PTVTC [22]. 

Although the decoding complexity of the Viterbi algorithm is measured 
in terms of 2 v+k /L, for multi-D schemes the complexity of subset (parallel 
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transition) decoding must also be taken into account due to the large number 
of parallel transitions. For the multi-D TC-MPSK codes considered here, since 
the subsets are block codes in signal space (the principle subset) or cosets of a 
block code, the suboptimum algorithm proposed in [13] can be used to decode 
each subset At high signal to noise ratios, this algorithm is only slightly inferior to 
optimum decoding, while the subset decoding complexity is significantly reduced. 
Optimum decoding requires 2 k ~ k — 1 branch comparisons, while suboptimum 
decoding requires _ i) comparisons to decode a subset at partition 

level p = k + q + 1. For example, for the 3.67 bit/T 6-D 16PSK code with 16 
states given in Table 12(a) where k = 3, we require 2 k ~ k — 1 = 2 8 — 1 = 255 
comparisons for an optimum decoder. As q = 0 the partition level is p = 4 
and from Table 6(a) we have mo = 3, m\ = 1, and m2 = m3 = 0. Therefore 
a suboptimal decoder only requires £\(2 Z-m ' — 1) = 17 comparisons. Thus 
a reduction in the number of comparisons of 255/17 = 15 times can be made 
between an optimum and sub-optimum decoders. 

4.7 Discussion 

The asymptotic coding gains for all the codes obtained have been plotted 
against complexity factor (3 = v + fc — log 2 L in Figures 12 and 13. Note that 
these graphs do not take into account complexity due to parallel transitions. In 
Figure 12(a), a plot of all the best codes found for 8PSK modulation and 2 bit/T 
is shown. The 2-D codes are from [23], Notice that the one state or “uncoded” 
codes are shown as well. Although the multi-D codes with one state have negative 
complexity, the 8-D uncoded case has a coding gain above 0 dB. These one state 
codes correspond to simple block coded modulation schemes that have recently 
become an active research area [13-16]. Those codes marked with an asterisk 
indicate that these are the best codes found in an incomplete code search. Those 
marked with a question mark are an attempt to predict the coding gains of higher 
complexity codes that have yet to be found. A set of prediction rules was used, 
where 

4«e < min(2A“ +1 + (v - k) A^, Aj + - +1 ) for A* + A" +1 < A^ +2 , v > k, 
or 4ee < rrnn(A 2 g+1 + A^ +2 + (v - k - l)A 2 q , A^ +1 ), 
for A' + A^ +1 > A“ +2 , v>k+ 1. 
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These rules attempt to predict the free distance based on observations of how 
d? hfX . increased in the code tables and on a knowledge of the incremental SED 
leaving or entering a state. For large v, these rules can be tightened when there 
isn’t an equality. For example, we would expect that for v = 5, the 4-D 8PSK 
rate 5/6 code would have < 7 6% = 4.101. However in reality the equality 
is not reached, since df Ke = 66%. Thus the former equation from above should 
be modified to 


4*e < min(2A^ +1 + ( v -k- l)Aj, Aj +i+1 ). 

This technique was used with general success in the code search to predict the 
values of k for each v. Thus, with a few calculations by hand, an idea of how 
long a code search will be, as well as the achievable coding gain, can be obtained 
before doing the actual code search. 

Also note from Figure 12(a) that for good codes with v = 2, as L increases 
the complexity decreases and 7 increases, eventually reaching 3.0 dB for L — 4. 
Thus, for the 8-D signal set, the complexity factor can be reduced by a factor of 
four, while maintaining 7, compared to the rate 2/3 code with v — 2. Beyond 
P — 4 ( and 7 = 3.0 dB), increases in coding gain are possible with the new 
codes that have been found. For L — 1, the rate of increase of 7 with ft seems 
to be slower than for L = 4. With L = 4, a “code barrier” of 7 = 6.0 dB 
will be reached due to the nature of the set partitioning. It would seem that very 
complex codes are required (ft > 14) if this 6.0 dB limit is to be broken. The 
codes for L=2 also seem to trend towards this barrier. Although this code barrier 
appears difficult to break, the previous codes found indicate that it can be reached 
faster than for L = 1, perhaps with a complexity factor reduction of four. These 
large complexity codes may be of interest in deep-space communication systems. 
The effort needed to build such a system may be justified, as indicated by the 
extremely large Viterbi decoder now being constructed at JPL [24]. 

Figure 13(a) compares the 4-D codes with 3 bit/T and 16PSK modulation to 
the equivalent 2-D codes [23], For low /3, the same effect observed for 8PSK and 
2 bit/T seems to be occuring. That is, ft is decreasing and 7 is increasing as L 
increases. Between (3 = 3 and ft = 9 the codes are close together, with perhaps a 
divergance at ft = 10 as indicated. In Figure 12(b) a variety of curves are shown 
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for 8PSK modulation. Notably, the same low /3 effect occurs for the curves at 

2.5 bit/T. The other two curves have a rate of (IL - l)/ 1 L (as do the 2.5 bit/T 
curves). They start off at 7 = 0 (for v = 1) and increase steadily. The curves for 

3.5 and 3.67 bit/T and 16PSK modulation (figure 13(b)) seem to follow the same 
type of pattern as the 2.5 and 2.67 bit/T curves, respectively. 

In Figure 12(c), codes of rates [(/ - 1)L+ 1]/[(7 - l)L+2] with 2.25 and 2.33 
bit/T and 8PSK modulation are shown. These rates seem to be characterized by a 
quick increase of 7 with /? and then a levelling off between 7 = 3 and 7 = 4 dB. 
The apparent low coding gains are due to the fairly large they are compared 
with. The 3.33 bit/T codes with 16PSK modulation in Figure 13(b) also seem to 
follow a similar pattern to the codes in Figure 12(c). 

Rate k/(k + 1), 2L—D, TC-MPSK codes also have the potential advantage of 
being used as inner codes in a high rate concatenated coding system with Reed- 
Solomon (RS) outer codes over GF(2 k ). If the inner decoder makes errors, one 
trellis branch error will exactly match one symbol in the outer RS code word. 
It is shown in [12] that the symbol oriented nature of multi-D TC-MPSK inner 
codes can provide an improvement of up to 1 dB in the overall performance of 
a concatenated coding system when these codes replace bit oriented 2-D TC- 
MPSK inner codes of the same rate. 
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5 Conclusions 


A means of systematically constructing multi-dimensional MPSK signal sets 
has been described. When these signal sets are combined with trellis coded 
modulation to form a rate k/(k + 1) code, significant asymptotic coding gains in 
comparison to an uncoded system can be achieved. These codes provide a number 
of significant advantages compared to trellis codes with 2-D signal sets. Most 
importantly, Reff can vary from I — 1 to I — l/L bit/T, allowing the coding system 
designer a greater choice in data rate without sacrificing data quality. As i?eff 
approaches /, though, increased coding effort (in terms of decoder complexity) 
or higher SNR is required to achieve the same data quality. 

Since the signal sets have been systematically constructed by using block 
code cosets, and systematic convolutional coding is used, a powerful total encoder 
system concept results. This approach has led to the construction of signal sets 
that allow codes to be transparent to discrete 360 °/M phase rotations, in amounts 
depending on the code and the signal set used. In general, it has been found that 
increasing phase transparency usually results in a decrease in steady state code 
performance, due to an increase in the number of nearest neighbors or a decrease 
in free distance. A complete encoder system, from the differential precoder to the 
2-D signal mapper, is presented, allowing an easy application of these codes. 

Another advantage is decoder complexity. Using the Viterbi algoithm, very 
high bit rates can be achieved due to the high values of k compared to convo- 
lutional codes that map into a 2-D signal set only. The many branch metric 
computations in the Viterbi decoder can be reduced either through the use of a 
sub-optimal comparison technique or large look up tables. Multi-D codes are also 
suited for concatenated coding with a Reed-Solomon outer code. A synergistic 
effect is obtained, since the multi-D codes tend to produce errors in blocks, which 
are matched to the RS code symbol size. 

Finally, this method of set partitioning and code construction can be applied 
to other signal sets such as QAM or QPSK. It is expected that similar coding 
gains will be achieved in comparison to existing codes with multi-D QAM signal 
sets. However, the advantages of the systematic approach described in this paper, 
we believe, will lead to faster acceptance and utilization of these multi-D codes. 


6 References 


[1] G. Ungerboeck, ’’Channel Coding with multilevel/ phase signals,” IEEE Trans. 
Inform. Theory, Vol. IT-28, pp. 55-67, January 1982. 

[2] G. D. Forney, Jr., R. G. Gallager, G. R. Lang, F. M. Longstaff, and S. 

U. Qureshi, “Efficient modulation for band-limited channels,” IEEE Trans. 
Selected Areas in Comm., Vol. SAC -2, pp. 632-647, Sept. 1984. 

[3] A. R. Calderbank and J. E. Mazo, “A new description of trellis codes,” IEEE 
Trans. Inform. Theory, Vol. IT-30, pp. 784—791, Nov. 1984. 

[4] A. R. Calderbank and N. J. A. Sloane, “Four-dimensional modulation with 
an eight state trellis code,” AT &T Tech. Journal, Vol. 64, pp. 1005-1017, 
May-June 1985. 

[5] L. F. Wei, “Trellis-coded modulation with multi-dimensional constellations,” 
IEEE Trans. Inform. Theory, Vol. IT-33, pp. 483-501, July 1987. 

[6] A. LaFanechere and D. J. Costello, Jr., “Multidimensional coded PSK systems 
using unit-memory trellis codes,” Proc. Allerton Conf on Commun., Cont., 
and Comput., pp. 428-429, Monticello, EL, Sept. 1985. 

[7] S. G. Wilson, “Rate 5/6 trellis-coded 8-PSK,” IEEE Trans. Commun., Vol. 
COM-34, pp. 1045-1049, Oct. 1986. 

[8] A. R. Calderbank and N. J. A., Sloane, “New trellis codes based on lattices and 
cosets,” IEEE Trans. Inform. Theory, Vol. IT-33, pp. 177-195, March 1987. 

[9] G. D. Forney, Jr., “Coset Codes,” IEEE Trans. Inform. Theory, to appear. 

[10] D. P. Taylor and H. C. Chan, “A simulation of two bandwidth efficient . 
modulation techniques,” IEEE Trans. Commun., Vol. COM-29, pp. 267- 
275, March 1981. 

[11] S. G. Wilson, H. A. Sleeper, P. J. Schottler, and M. T. Lyons, “Rate 3/4 
convolutional coding of 16PSK: code design and performance study,” IEEE 
Trans. Commun., Vol. COM-32, pp. 1308-1315, Dec. 1984. 

[12] R. H. Deng and D. J. Costello, Jr., “High rate concatenated coding systems 
using multi-dimensional bandwidth efficient trellis inner codes,” IEEE Trans. 
Commun., to appear. 


37 



[13] S. L Sayegh, “A class of optimum block codes in signal space,” IEEE Trans. 
Commun., VoL COM-34, pp. 1043-1045, Oct. 1986. 

[14] E. L. Cusack, “Error control codes for QAM signalling”, Electronics Letters, 
Vol. 20, No. 2, pp. 62-63, 19 Jan. 1984. 

[15] R. M. Tanner, “Algebraic construction of large euclidean distance combined 
coding/modulation systems,” Computer Research Laboratory Technical Re- 
port, University of California, UCSC-CRL-87-7, 5 June 1987. 

[16] S. Lin, “Bandwidth efficient block codes for M-ary PSK modulation,” NASA 
Technical Report, Grant No. NAG 5-931, University of Hawaii, Dec. 1987. 

[17] M. Rouanne, “Distance bounds and construction algorithms for trellis codes,” 
PhD dissertation, University of Notre Dame, April 1988. 

[18] K. J. Larson, “Comments on ‘An efficient algorithm for computing free 
distance’,” IEEE Trans. Inform. Theory, Vol. IT- 18, pp. 437-439, May 
1972. 

[19] G. D. Forney, Jr., private communication, March 1987. 

[20] D. Divsalar and M. K. Simon, “Multiple trellis coded modulation (MTCM),” 
IEEE Trans. Commun., Vol. COM-36, No. 4, pp. 410-419, April 1988. 

[21] F. Hemmati and R. J. F. Fang, “Low complexity coding methods for high data 
rate channels,” Comsat Technical Review, Vol. 16, pp. 425-447, Fall 1986. 

[22] S. S. Pietrobon, “Rotationally invariant convolutional codes for MPSK mod- 
ulation and implementation of Viterbi decoders,-” Masters Thesis, South Aus- 
tralian Institute of Technology, June 1988. 

[23] G. Ungerboeck, “Trellis-coded modulation with redundant signal sets,” IEEE 
Commun. Magazine, Vol. 25, pp. 5-21, Feb. 1987. 

[24] O. Collins, “Techniques for long constraint length Viterbi decoding,” Califor- 
nia Institute of Technology, submission to IEEE Information Theory Sympo- 
sium, Kobe City, Japan, June 1988. 


38 


Table 1: A 2 x 3 Binary Matrix Space Partition 


Partition 
Level ( p ) 

Principal Subsets 

m 

Generator Matrices 
G m2 G mi G mo 

Coset Representatives 
(r p ) T 

0 

fi(C 0 ,C 0 , Co) 

Go 

Go 

Go 

- 

1 

fl(C 0 , C 0 , Ci) 

Go 

Go 

G x 

(01) 

2 

n(c 0 , c 0 ,c 2 ) 

Co 

Go 

- 

(11) 

3 

n(c 0 ,c x ,c 2 ) 

Co 

Gi 

- 

(02) 

4 

fi(C 0 , C 2 ,C 2 ) 

Go 

- 

- 

(22) 

5 

fi(C t ,C 2 ,C 2 ) 

Gx 

- - 

- 

(04) 

6 

Q(C 2 ,C 2 ,C 2 ) 

- 

- 

- 

(44) 


Note: G 0 = 


1 0 
0 1 


Gi — [11], 


T i = [ 0 1 ] T 

r 2 = [ 1 1 ] T 


Table 2: A 4-D 8PSK Signal Set Partition 


Partition 
Level (p) 

Principal Subsets 

MSSD 

'(A*)' 

Coset Representatives 
(t?) T 

0 

P(C 0 ,Co,C 0 ) 

0.586 

- 

1 

P(C 0 ,Co,Cx) 

1.172 

(01) 

2 

P(C 0 , C 0 , C 2 ) 

2 

(11) 

3 

P(C 0 ,Cx,C 2 ) 

4 

(02) 

4 

P(C 0 ,C 2 ,C 2 ) 

4 

(22) 

5 

P(Cx,C 2 ,C 2 ) 

S 

- (04) 

6 

P(C 2 ,C 2 ,C 2 ) 

• oo 

(44) 


C 0 : (2,2) code , 

C i : (2, 1) code , 
C 2 : (2, 2) code , 


Go 


1 0 
0 1 


Gi=[ll 


3 


3 


do — 1 

rfj =2 
d 2 = oo 


T, = [0lf 

= [ 1 1 F 
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Table 3(a): 6-D 8PSK Signal Set Partition I 


Partition 
Level (p) 

Principal Subsets 

MSSO 

(A?) 

Coset Representatives 

(r p ) T 


P(Co,Co,C 0 ) 

0.586 

- 

1 

P(C 0 ,C 0 ,Ci) 

1.172 

(Ill) 

2 

P(Co,Co,Cj) 

1.172 

(no) 

3 

P(C 0 ,C 0 ,C 3 ) 

2 

(Oil) 

4 

P(C 0 ,Cj,C 3 ) 

4 

(222) 

5 

P(C 0 , C\, C 3 ) 

4 

(220) 

6 

P(Co,C 3 ,C 3 ) 

4 

(022) 

7 

p(cj,c 3 ,c 3 ) 

8 

(444) 

8 

P(C},C 3 ,C 3 ) 

8 

(440) 

9 

P(C 3 ,C 3 ,C 3 ) 

00 

(044) 


C 0 : (3,3) code , 


C] : (3,2) code , 

Ci : (3, 1) code , 

C3 : (3,0) code , 

Po = 1, Pi = 4, p 2 



' 1 

0 

0 ' 





G 0 — 

0 

1 

0 

j d 0 

= 1 




. 0 

0 

1 . 





G{ = 

■ 1 
0 

0 

1 

1 ■ 
1 

, d\ 

= 2; 

rl 

= [111] 

G 2 = 

i 0 

1 

r 

, d\ 

= 2; 

r 2 

= [110] 





^3 

= 00; 

T 3 

= [011] 
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Table 3(b): 6-D 8PSK Signal Set Partition II 


Partition Principal Subsets MSSD Coset Representatives 


Level (p) 1 (A?) (r p ) T 


P(C 0 , C 0 , C 0 ) 0.586 

P(C 0 , C 0 , C 2 ) 0.586 (001) 

P(C 0 ,C 0 ,C 2 ) 1.757 (Oil) 

P(C 0 , C 0 , C 3 ) 2 (111) 

P(C 0) C{,C 3 ) 4 (222) 

P(C 0 , C 2 , C 3 ) 4 (220) 

P(C 0 ,C 3 ,C 3 ) 4 (022) 

P(C}, C 3 , C 3 ) 8 (444) 

P(C 2 , C 3 , C 3 ) 8 (440) 

P(C 3 , C 3 , C 3 ) co (044) 


’10 0' 

C 0 : (3, 3) code , G 0 = 0 1 0 , d 0 = 1 

.001. 

Cf: (3,2) code, G?= J ° J , d\= 1; r, 2 = [ 0 0 1 ] r 

C 2 : (3, 1) code , G 2 = [ 1 1 1], d\ — 3; r 2 = [ 0 1 1 } T 

C 3 : (3, 0) code , ' d 3 = oo; r 2 = [ 1 1 1 ] T 

Other codes are from Table 3(a). 

Po = 3, pi = 4, p 2 = 7 
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Table 4: 8-D SPSK Signal Set Partition 


Partition 
Level (p) 

Principal Subsets 

MSSD 

(A?) 

Coset Representatives 

(t’) T 

0 

P(Co,Co,C 0 ) 

0.586 

- 

1 

P(Co,C 0 ,C 1 ) 

1.172 


2 

P(C 0 ,C 0 ,C 2 ) 

1.172 


3 

P(Co,C 0 ,C 3 ) 

2 


4 

P(C 0 ,Ci,C 3 ) 

2.343 


5 

P(C 0 ,C 1 ,C 4 ) 

4 

(1111) 

6 

P(C 0 ,C 2 ,C 4 ) 

4 


7 

P(C 0 ,C 3 ,C 4 ) 

4 


8 

P(C 4 ,C 3 ,C 4 ) 

8 


9 

P(C 1 ,C 4 ,C 4 ) 

S 

(2222) 

10 

P(C 2 ,C 4 ,C 4 ) 

8 

(0044) 

11 

P(C 3 ,C 4 ,C 4 ) 

16 

(0404) 

12 

P(C 4 , C 4 , C 4 ) 

oo 

(4444) 


C 0 : (4, 4) code , 


C 4 : (4,3) code , 


C 2 : (4, 2) code , 

C 3 : (4, 1) code , 
C 4 : (4, 0) code , 



1 

0 

0 

0 


0 

1 

■ 0 

0 

G 0 =. 

0 

0 

1 

0 


o- 

0 

0 

1 



0 

0 

1 : 

Gi = 

0 

1 

0 

1 


. 0 

0 

1 

1 . 


1 

0 

1 

0 ‘ 

g 2 = 

0 

1 

0 

1 

II 

CO 

0 

[1 

1 . 

1 

1 ], 



d x =2: t x = [ 0 0 0 1 ] r 

d 2 = 2; r 2 = [ 0 0 1 1 ] T 

d 3 = 4; r 3 = [ 0 1 0 1 ] r 

d 4 = co; r 4 = [ 1 1 1 1 ) T 


Po = 5, pi = 9, p 2 = 12. 
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Table 5: 4-D 16PSK Signal Set Partition 


Partition 
Level (p) 

Principal Subsets 

MSSD 

(A?) 

Coset Representatives 

(r p ) T 


P(Co, Co, Co, Co) 

0.152 

- 

1 

P(C 0 , Co, Co, Ci) 

0.304 

(01) 

2 

P(Co, C 0 , Co, C 2 ) 

0.586 

(11) 

3 

P(Co, Co, Ci, C 2 ) 

1.172 

(02) 

4 

P(Co,C 0 ,C 2 ,C 2 ) 

2 

(22) 

5 

P(C 0 ,C 1 ,C 2 ,C 2 ) 

4 

(04) 

6 

P(Co,C 2 ,C 2 ,C 2 ) 

4 

(44) 

. 7 

P(Ci,C 2 ,C 2 ,C 2 ) 

8 

(08) 

8 

P(C 2 ,C 2 ,C 2 ,C 2 ) 

CO 

(88) 


Codes are from Table 2 


Po = 2, pi = 4, p 2 = 6, p 3 = § 
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Table 6(a): 6-D 16PSK Signal Set Partition I 


Partition 

Principal Subsets 

MSSD 

Coset Representatives 

Level (p) 


(AJ) 

(t’) T 

0 

P(Co, C 0 , C 0 , C 0 ) 

0.152 

- 

1 

P(C 0 ,Co,Co,CD 

0.304 

(Ill) 

2 

P(C 0 , C 0 , C 0 , Cj) 

0.304 

(110) 

3 

P(Co, C 0 , C 0 , C 3 ) 

0.586 

(Oil) 

4 

P(C 0 , C 0 , Cj, C 3 ) 

1.172 

(222) 

5 

P(Co,Co,C5,C 3 ) 

1.172 

(220) 

6 

P(C 0 ,C 0 ,C 3 ,C 3 ) 

2 

(022) 

7 

P(C 0 , C*, C 3 , C 3 ) 

4 

(444) 

8 

P(C 0 ,CJ,C 31 C 3 ) 

4 

(440) 

9 

P(Co,C 3 ,C 3 ,C 3 ) 
P(C}, c 3 , C 3 , C 3 ) 

4 

(044) 

10 

8 

(888) 

11 

P(C>,C 3 ,C 3 ,C 3 ) 

8 

(880) 

12 

P(C 3 ,C 3 ,C 3 ,C 3 ) 

cc 

(OSS) 


Codes are from Table 3(a) 


Po = 1, pi = 4, p 2 = 7, p 3 = 10 
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Table 6(b): 6-D 16PSK Signal Set Partition II 


Partition 
Level (p) 

Principal Subset 

MSSD 

(A?) 

Coset Representatives 

(r p ) T 

0 

P(C 0 , Co, C 0 , C 0 ) 

0.152 

- 

1 

P(C 0 , C 0 , Co, Cj) 

0.152 

(001) 

2 

P(C 0 , C 0 ,C 0 , C\) 

0.457 

(011) 

3 

P(Co, Co, Co, C 3 ) 

0.5S6 

(111) 

4 

P(Co,C 0 ,C^,C 3 ) 

1.172 

(222) 

5 

P(C 0 , C 0 , C 2 , C 3 ) 

1.172 

(220) 

6 

P(Co, Co, C 3 , C 3 ) 

2 

(022) 

7 

P(Co,C},C 3 ,C 3 ) 

4 

1 

(444) 


Codes are from Tables 3(a) and 3(b) 
Po = 3, pi =4, p 2 = 7, p 3 = 10 


Table 6(c): 6-D 16PSK Signal Set Partition III 


Partition 
Level (p) 

Principal Subsets 

MSSD 

(Ap 

Coset Representatives 

(r p ) T 

0 

P(Co,C 0 ,C 0 ,C 0 ) 

0.152 

" 

1 

P(C 0 ,Co,C 0 ,C2) 

0.152 

(001) 

2 

P(C 0 ,Co,Co,C 2 2 ) 

0.457 • 

(Oil) 

3 

P(Co, Co, Co, C 3 ) 

0.5S6 

(111) 

4 

P(C 0 ,C 0 ,C 2 ! C 3 ) 

0.5S6 

(002) 

5 

P(C 0 ,Co,C|,C 3 ) 

1.757 

(022) 

6 

P(Co, Co, C 3 , C 3 ) 

2 

(222) 

7 

P(C 0 ,C1,C 3 ,C 3 ) 

4 

(444) 


Codes are from Tables 3(a) and 3(b) 
Po = 3, pi = 6, p 2 = 7, p 3 = 10. 
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Table 7: Squared Euclidean Weights Used in Code Search for 
Rate 7/8 Code with 6-D 8PSK Signal Set II ( k = 2). 



w 2 {e k n ) 


m ( e n) 

000 

0.0 

1 

1 

001 

1.172 

3 

2 

010 

1.757 

9 

4 

Oil 

0.586 

3 

1 

100 

2.0 

16 

6 

101 

1.172 

12 

2 

110 

1.757 

18 

4 

111 

0.586 

6 

1 
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Table 8: 4-D (L=2) Trellis Coded SPSK 


8(a) R = 5/6, tf eff = 2.5 bit /T, d\ = 1.172 


V 


*(<%«> 

< 

d 2 , 
next 

N KexO 

< 

7 

(dB) 

h 2 

h 1 

h° 

Transparency 

(2^) 

1 

1.757 

81 

- 

- 

1.76 

- 

2 

3 

o 

O 

2 

2.0 

6 

2.929 

243 

2.32 

- 

2 

5 

90° 

3 

2.929 

19S 

: : 

- 

3.98 

ia 


13 

45° 



180 


- 


IS 


17 

o 

O 

4 

3.515 

2079 

■ 

- 

4.77 

EI 



45° 



1944 


- 


H 



90° 

5 

3.515 

252 

■ 

- 

4.77 

32 

22 

57 

45° 



144 


- 


26 

04 

53 

90° 

6 

4.0 

6 

4.101 

5058 

5.33 

004 

030 

127 

45° 





2160 


060 

004 

127 

90° 


74 = -1.35 dB, = 0.23 dB 


8(b) R = 4/5, i7 eff = 2 bit /T, d 2 u = 2.0 


V 

rf free 

^free) 

< 

<4ext 



h 3 

h 2 

h l 

h° 

Transparency 
" (2 d vp) 

1 

3.172 

36 

- 


2.00 

• - 

- 

2 

3 

45° 

2 

4.0 

6 

5.172 

108 

3.01 

. - 

- 

2 

5 

45° 

3 

4.0 

2 

5.172 

64 

3.01 

- 

04 

02 

17 

O 

O 

CT) 

4 

5.172 

34 

- 

- 

4.13 


14 

mm 

25 

o 

O 



30 

- 

- 



04 

n 

23 

180° 

5 

6.0 

6 

- 

- 

4.77 

14 

24 

06 

43 

CO 

o 

o 

6 

6.343 

56 

- 

- 

5.01 

070 

044 

046 

143 

o 

O 

o* 



45 

- 

- 


070 

034 

076 

105 

180° 


74 = 0 dB, -/f = 0 dB 
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Table 9: 6-D (L=3) Trellis Coded SPSK 


9(a) R = 8/9, R eS = 2.67 bit /T, d\ = 1.172 


V 

jj^^j 

wmmi 

<£ext 

' V <4ext) 

< 

7 

(dB) 

h 3 

h 2 

h 1 

h° 

Transparency 

(2 d tf) 

1 

1.172 

15 

1.757 

1512 

0.0 

- 

- 

2 

3 

45° (I) 

2 

1.757 

432 

- 

- 

1.76 

- 

- 

2 

5 

45° (II) 

3 

2.0 

16 

2.343 

225 

2.32 

- 

04 

02 

11 

45“ (I) 

4 

2.343 

81 

- 

- 

3.01 

El 

mm 

02 

23 




63 

- 

- 


El 

la 

04 

21 


5 

2.929 

3969 

- 

- 

3.98 

14 

24 

02 

77 

90° (I) 

*6 

2.929 

594 

- 

- 

3.98 

066 

026 

012 

101 

90° (I) 


74 = —1.07 dB, 7 f = 1-14 dB 


* Search incomplete. 


9(b) R = 7/8, i2 eff = 2.33 bit /T, d\ = 1.757. 


V 


Hgsi 

d lext 

iV «ext) 

< 

7 

(dB) 

h 4 

h 3 

h 2 

h l 

h° 

Transparency 

(2 d $) 

1 

2.0 

16 

2.343 

243 

0.56 

- . 

- 

- 

2 

3 

90° (II) 

2 

2.5S6 

'48 

- 

- 

1.68 

- 


6 

4 

7 

90° (II) 

3 

3.757 

144 

- 

- 

3.30 

- 

- 

04 

02 

11 

180° (II) 

4 

4.0 

19 

4.343 

432 

3.57 

- 

- 

14 

02 

33 

90° (II) 

5 

4.0 

7 

4.343 

360 

3.57 

- 

30 

14 • 

26 

Bl 






252 


- 

16 

34 

06 

Kb 

. 

*6 

4.0 

3 

4.343 

260 

3.57 

074 

14 

024 

002 

101 

90° (II) 


74 = 0.11 7F=1.10d5 
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I 


9(c) R = 6/7, i? eff = 2.0 bit/7 1 , d 2 u = 2.0 


V 

f^j 


4ext 

*Kext) 

< 

7 

(dB) 

h 4 

h 3 

h 2 

h 1 

h° 

Transparency 

(2 d y) 

1 

3.757 

288 

- 

- 

2.74 

- 

- 

- 

2 

3 

180° (II) 

2 

4.0 

19 

5.757 

2304 

3.01 

- 

- 

- 

2 

5 

180° (II) 

3 

4.0 

7 

5.757 


3.01 

- 

- 

ma 

06 

13 








- 

- 

H 

02 

17 


4 

4.0 

3 

5.757 

576 

3.01 

- 

10 

EH 

06 

25 

90° (II) 





528 


- 

10 

El 


33 

ISO 0 (II) 

*5 

5.757 

272 

- 

- 

4.59 

60 

24 

20 

06 

II 

180° (II) 

=*=6 

5.757 

80 

! _ 

- 

4.59 

060 

050 

006 

002 

III! 

180° (II) 


74 = 0 dB, 7f = 0 dB 














































Table 10: 8-D (L=4) Trellis Coded 8PSK 


10(a) R = 11/12, = 2.75 bit /T, d\ = 1.172 


u 

‘'free 

*(<%<*) 

< 

■4ext 

X 

<D 

^ VI 

7 

(dB) 

h 3 

/i 2 

h 1 

h° 

Transparency 

(2^) 

1 

1.172 

54 

1.757 

14580 

0.0 

- 

- 

2 

3 

45° 

2 

1.757 

1944 

- 

- 

1.76 

- 

6 

4 

5 

45° 

3 

2.0 

26 

2.343 

2916 

2.32 

- 

04 

02 

11 

45° 

4 

2.343 

963 

- 

- 

3.01 

10 

06 

04 

21 

45° 

5 

2.343 

27 

2.929 

110376 

3.01 

34 

16 

10 

45 

45° 


74 = —0.94 dB, 7 p = 1-60 dB 


10(b) R= 10/11, i? eff = 2.5 bit/T, d 2 u = 1.172. 


V 

|Q 

"(‘'free) 

< 

^next 

"«ext) 

< 

7 

(dB) 

h 3 

h 2 

h 1 

h° 

Transparency 

(2 d ip) 

1 

2.0 

26 

2.343 

5832 

2.32 

- 

- 

2 

3 

45° 

2 

2.343 

2043 

— - 

- 

3.01 

- 

4 

6 

7 

45° 

3 

2.343 

27 

3.172 

312 

3.01 


04 

02 

11 

45° 

4 

3.172 

78 

- 

- 

4.33 

14 

04 

02 

21 

45° 

5 

3.515 . 

4779 

-- 

- 

4.77" 

IB 


EEfl 


45° 



4428 

- 



E9 




90° 

*6 

4.0 

52 

4.343 

9S2S 

5.33 



. 


45° 


74 = —1.35 dB, 7 f — 0.23 dB 
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10(c) R — 9/10, i? ef f = 2.25 bit/T, d\ = 2.0 


V 

ri free 

1 A ppb 

<ext 

*Kext) 

< 

7 

(dB) 

h 4 

h 3 

h 2 

h 1 

h° 

Transparency 

(2 d $) 

1 

2.343 

27 

3.172 

1092 

0.69 

- 

- 

- 

2 

3 

45° 

2 

3.172 

156 

- 

- 

2.00 

- 

- 

6 
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Table 11: 4-D (L=2) Trellis Coded 16PSK 
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Table 12: 6-D (L=3) Trellis Coded 16PSK 
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m = 0 m = 1 m = 2. 

(Co/Cj) (C 0 /C 2 ) 


Figure 2: A 2/2 : way partition chain C 0 /Ci/C 2 in the 2-D binary vector space. 
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2 — 0 ^(Cq> C*, C 2 ) 


-2T — 0 


fi(C 0 , Co, Cj) . 


z° = 0 


fi(Co, Cq, Co) 


fi(C 0 ,C 0 ,C 1 (l))- 


Q(Co,C 0 ,C 2 ): 


^(Co,C 0 ,C 2 (2)): 


ft(Co,Co,C 2 (l))<" 


•0(C 0 ,C 1 (4),C 2 ) 


'Q(C 0 , Ci, C 2 (2)) 


‘fi(C 0 ,C l (6),C 2 (6)) 


-^(C 0 , C a , C 2 (l)) 


•fi(C 0 ,C 1 (5),C J (5)) 


n(C 0 ,C 0 ,C 2 (3)): 


1 fi(C 0 , Cj(3), C 2 (3)) 


■fi(C 0 ,C 1 ,C 2 (7)) 


p = 0 


p = 1 


p=2 


P = 3 


Note: for p = 3 ro 2 = 0; C m2 (^) = C 0 , 

= i; c mi (*) = C!©^ 2 ®^ 1 ] 7 ; 

m 0 = 2; C mo (z) = C 2 ©[^°©^] 7 ’. 

Figure 3: A 2/2/2 par tition chain f!(C„, C„,C„)/n(C 0 , C„, C,)/fi(C 0l C„, C 2 )/fi(C„, C„ C 2 ) 
in the 2x3 binary matrix space. y 
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Note: At p = 3 m 2 = 0; C m2 ( 2 ) = C, 

mx = l; C mi (2) = C 1 ©[0 ! 2 2 ©2°-2 1 ] 7 ’ ) 
■ m 0 = 2; C mo (2) = C 2 ®[2 1 ,2°©2 1 ] 7 ’.. 

Figure 4: A three level 4-D 8PSK signal set partition. 


57 



Z 2 = o W>(C 0 ,C 0 ,C„C 2 ) 


z° = 0 


£.(16) 



z 1 = 0 


,P(C 0 , Cq, C 0 , Ci)' 


0 

P( Co,Co,C 0 ,Ci(l)) N 
1 


fP(Co, Cq, Co, C2 



^(Co,Co,C 0 ,C 2 (2))C 


1 ^P(Co,C 0 ,C 1 (4),C 2 ) 


0 ^P(C 0 ,C 0 ,Ci,C 2 (2)) 


1 ^P(Co,C„,C 1 (6),C 2 (6)) 
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Note: For p = 3 m 3 = 0; C ma (z) = C 0 

m 2 = 0; C m2 (z) = C 0 , 
mi = l; C mi (z) = C L © (0,^ 2 © z° ■ z 1 }? , 
m 0 = 2; C mo ( 2 ) = C 2 ©[ 2 1 , 2 °©z 1 ] T - 

Figure 5: A three level 4-D 16PSK signal set partition. 
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Abstract 

The performance of a trellis code can be accurately predicted from its distance 
spectrum. A class of quasi-regular codes is defined for which the distance spec- 
trum can be calculated from the codeword corresponding to the all-zero information 
sequence. An algorithm to compute the distance spectrum of linear, regular, and 
quasi-regular trellis codes is presented. In particular, it can calculate the weight 
spectrum of convolutional (linear trellis) codes and the distance spectrum of most of 
the best known trellis codes. The codes do not have to-be linear or regular. The 
algorithm is a bidirectional stack algorithm. We use the algorithm to calculate the 
beginning of the distance spectrum of some of the best known trellis codes and to 
-compute tight estimates on the first event error probability and on the bit error 
probability. 


1 Introduction 

The performance of a trellis code depends on the decoding algorithm employed and on the 
distance properties of the code, i.e., the distances between codewords. The exact error 
probability of a coded system cannot be calculated, even for simple trellis codes. However, 
trellis code error probabilities can be estimated using simulations and performance bounds. 
Simulations often require long running times and are only useful for short constraint length 
codes. Performance bounds are the most common means of estimating the error probability 

'This work was supported by NASA Grant NAG5-557. Part of this material was presented at the 
Conference on Information Sciences and Systems, Princeton, NJ, March 1988. 
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of codes and of designing new coding schemes. The distance spectrum can be used to 
compute performance bounds. 

A trellis code, or trellis coded modulation (TCM), consists of a convolutional encoder 
followed by a mapper. Figure 1 shows a typical trellis code as originally designed by 
it- Ungerboeck [10]. This schematic representation was introduced by Forney et. al. [6]. 

An encoder state is characterized by the values of the past information bits stored in the 
shift registers of the convolutional encoder. The incoming information bits determine the. 
transitions or branches connecting one state to the other. During each signaling interval 
r, the k information bits enter the convolutional encoder and an n-bit subset 

selector v T = (uj,...,v") leaves the encoder. Subset selectors depend on the incoming 
k information bits and on v past information bits only, where u is called the constraint 
length of the trellis code. The mapper transforms the subset selector into a subset of channel 
signals. The uncoded information bits . . , u/. then select one particular channel signal 

from the selected subset. The set of possible channel signals is denoted by S. 

A topological trellis is a trellis with no labels on the branches. A topological path Y = 
{. . . , Y t , Y r+1 ,\ . .} through a topological trellis is a sequence of consecutive branches Y r 
which have not yet been assigned a signal. The topological branch Y r is the r th branch in 
Y and corresponds to the r th signaling interval since the beginning of Y. A channel path y 
is defined by a topological path Y and a sequence of labels: y = {..., y T , y T +i , . where 
y T is a branch in Y labeled with a channel signal in S. A channel path is a path through 
a labeled trellis. The length l of a path is the number of consecutive branches that form' 
the path, and it can be finite or infinite. We will sometimes call y T a signal although it is 
more properly called a labeled branch, and we say that y T is the outpur signal during the 
r th signaling interval. The context should make clear when we mean a labeled branch or a 
signal. 

A labeling of a topological trellis associates a signal with each branch in the trellis. A 
trellis labeling can be seen as the combination of a binary encoder and a mapper. A trellis 
code is uniquely characterized by a topological trellis and a labeling. The most general 
way to define a trellis code is by using a table which assigns a signal to each branch in the 
trellis. In other words, a trellis code is the set of all labeled trellis branches for all signaling 
intervals (to construct a trellis code, one only needs to label trellis branches with channel 
signals). 

Given two signal or subset sequences y = (..., Vi, j/ 2 , • ••,!//,--•) and y = (..., y u y 2 , ... , 
Vi , ' ■ •)) (y,y) is a first event en'or of length / if Y r = Y r for t < 1, Y T ^ Y r for 1 < r < /, 
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and Y t = Y T for r > /, i.e., the error event starts when the two paths diverge and ends 
when the two paths remerge for the first time. Figure 2 shows a first event error of length 
l (the correct and incorrect paths remerge after l branches). 

The performance of a trellis code depends on the distribution of distances between 
encoder output sequences (code words) corresponding to distinct encoder input sequences. 
In particular, if y = {yj, . . . , y T , . . .} and y = {y l} . . . , y T , . . .} are sequences of signals from 
S, the squared Euclidean distance d(y, y) satisfies 

OO 

<*(y,y) = £%r,yr), (i) 

T=1 

where d(y T ,y r ) is the squared Euclidean distance between two channel signals y T and y T . 
The distance between two code words determines the likelihood of decoding one code word 
when the other one was sent. For an AWGN channel, the squared Euclidean distance 
between two signal sequences y and y determines the likelihood of receiving y given that 
y was sent. 

A. union bound on the first event error probability P e of trellis codes may be obtained 
by summing the error probability over all possible incorrect paths which remerge with all 
possible correct paths [12]. At any time unit, P e is bounded by 

(2) 

where d represents the squared Euclidean distance between signal sequences, Ad is the 
average number (multiplicity) of code words at distance d from a specific code word, where 
the average is taken over all code words in the code, df ree is the minimum free squared 
Euclidean distance of the code, N 0 is the one-sided noise power spectral density, and Q(.) 
is the Gaussian integral function Q(fi) = e~ Equation (2) can be rewritten as 

OO 

P e < Y A d P d, 

d=df ree 

where P d = Q(d/\/2N 0 ) is the two code word error probability for distance d. 

The bit error probability Pj is the average number of bit errors per decoded information 
bit. Equation (2) can be modified to provide a bound on Pb by weighting each term P d by 
the average number B d of information bits on all paths at distance d from the correct path 
[12]. Hence, at any time unit, P b is bounded by- 

OO 

Pb < Y BdPd • ( 3 ) 

d=dj r « 
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A spectral line is defined by a distance d and its average multiplicity A^. The set of all 
spectral lines is called the distance spectrum of the code. If the code is linear, is the 
number of code vectors of weight d in the code, and the distance spectrum is commonly 
called the weight spectrum of the code [8]. 

The first event error probability can be estimated in terms of the free distance d j ree of 
the code. Trellis codes with a large free distance are optimum at large signal- to-noise ratios 
(SNR). At moderate SNR, the optimality of the code depends on the first few spectral lines 
of the code, especially for non-binary, non-regular trellis codes whose distance spectra are 
relatively dense. This means that the codes with the best free distance may not be the 
codes that perform the best for moderate SNR. 


2 Quasi- Regularity 

Given a signal set S and an equivalence relation (i?) defined between elements of S, two 
elements x and y in S belong to the same equivalence class of (R) if and only if they satisfy 
(R), which is denoted xRy. An equivalence sequence (Rf) / (R?) / . . . / (R r ) on S is a set of 
r equivalence relations defined on elements of S which satisfy 

V i,j, 1 <i< j<r, V x,yeS, xR^y =S> xR { y, (4) 

where r is the number of levels in the sequence. The equivalence classes generated by (i?,) 
are called subsets of S at level i and form a partition chain of the signal set. 

A trellis code is linear for an operation called a sum iff the sum of any two codewords 
is a codeword. For example, convolutional codes are linear because the modulo-two sum 
of any two codewords gives a binary codeword [8]. Linearity can be defined with respect 
to an equivalence relationship defined on signals [5]. The equivalence classes defined by a 
relation (R) partition the signal set into subsets. Let a sum be defined on these subsets 
and a trellis code be labeled with subsets. The codewords of such a code are sequences 
of subsets, and the code is linear with respect to (R) iff the sum of any two codewords 
(sequences of subsets) is another codeword (sequence of subsets). 

When the trellis is labeled with subsets, the "sum of signals need not be defined and 
linearity with respect to ( R ) is less stringent than linearity with respect to signals. When 
the signal set is given the structure of a group and partitioned into cosets, linearity with 
respect to cosets is the same as linearity with respect to signals, because then the sum is 
defined, from the group structure. This can be shown as follows. Suppose that a code is 
linear with respect to the cosets of a subgroup of the signal set. Consider two sequences 
of signals through the trellis: they define two sequences of cosets whose sum is a codeword 
because the code is linear with respect to cosets. The sum of the two sequences of signals 
corresponds to that codeword and must be a channel path through the trellis, which proves 
that the code is linear with respect to signals. 

A trellis code is regular iff the distance between two codewords that correspond to 
distinct information sequences depends only on the binary sum of the two information 
sequences (we assume that the distance is an additive metric). Once again, codewords 
can be sequences of subsets and regularity can be defined with respect to an equivalence 
relation [3]. In such a case, the distance between subsets is the minimum distance between 
the signals in the subsets. 
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Regularity makes the calculation of the distance properties of a code feasible. For regular 
codes, the set of distances of incorrect paths from a correct path does not depend on the 
correct path. This means that the distance properties of regular codes can be calculated by 
assuming that an arbitrary correct path was sent. In practice, it is assumed that the path 
generated by the all-zero information sequence was sent. This assumption considerably 
reduces the complexity of distance spectrum calculations, since only one among many 
correct paths must be evaluated. Unlike linearity, regularity with respect to cosets is not 
equivalent to regularity with respect to signals. There are no known bandwidth efficient 
codes that are regular with respect to signals, and it can be conjectured that none exist. 
However, there exist many known bandwidth efficient codes that are regular with respect 
to cosets [3] and [5]. 

Ungerboeck was the first to show that for certain non-regular codes the free distance can 
be calculated by assuming that the all-zero information sequence was sent [10]. This can 
be generalized to any trellis code, regular or not, but leads to far more complex algorithms 
than the one presented in the next section. Instead, we define the class of quasi-regular 
codes to be non-regular codes for which the distance spectrum can be calculated with a 
relatively simple algorithm by assuming that the all-zero information sequence was sent [9]. 

A mapping of signal selectors onto a signal set is regular iff the distance between two 
signals depends only on the Hamming distance between their signal selectors. For example, 
there is no regular mapping of eight signal selectors onto an 8-PSK signal set. Similarly, 
there is no regular mapping from four signal selectors onto 4- PAM or from 16 signal selectors 
onto 16-QASK. Figure 3 shows a non-regular mapping onto 8-PSK. This particular mapping 
is known as the natural mapping. The regularity of a mapping can also be defined with 
respect to subsets. 

Let s and s be two states in a trellis code, and a signal selector error vector e the binary 
sum of two signal selectors. Then the distance polynomial P site (x) represents the set of 
distances between signals generated from s and 3 and whose signal selectors differ by e. 
The polynomials P s ,s te (x) are only defined for those e for which there exists a branch that 
leaves s and a branch that leaves i whose signal selectors differ by e. Let t/(v) be the signal 
in S whose signal selector is the binary n-tuple v. Then the polynomial P 3 ,s, e { x ) is given 

by 

iWO) = £ p(v\s)x d ^'^\ 

v\s 

where p(n|s) is the probability of the signal selector v given that the encoder is in state 
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s and d[y(v), y(v + e)] is the squared Euclidean distance between y{v ) and y(y + e). For 
example, for 8-PSK, if 2/(000), y(010), y(100), y(110) leave state s and y(001), y(011), 
2/(101), y(lll) leave state s, only four error vectors e are possible between the branches 
that leave s and s (Figure 3). These four error vectors are 001, 011, 101, and 111, and the 
corresponding distance polynomials satisfy 

Ps, 5,001(1) = x s ° , 

P a ,sM*) = ' 1 / 2 ** + 1 / 2 **, 

^ s ,i,ioi(ar) = 

Ps,s, iu(x) = l/ 2 x 5 ° + l/ 2 x 52 . 

These polynomials look similar to the “weight profiles” defined by Zehavi and Wolf in a 
paper on the performance of rate k/(k + 1) trellis codes mapped by set partitioning [14], 
However, weight profile polynomials are defined from a knowledge of the signal set and the 
mapper only, whereas the above polynomials are code dependent. This allows the definition 
to apply to a large class of codes of any rate, which includes the codes treated by Zehavi 
and Wolf. 

A trellis code is quasi-regular iff (i) it consists of a linear binary encoder followed by a 
mapper and (ii) for all e and all pairs of states (s^si) and (s 2 ,s 2 ), P su $ l<e (x) = P S7 j 2 , e (x ) 
(provided that the two polynomials are defined). By definition, for regular codes, the 
distance between signals depends only on the binary sum e of their signal selectors (P 3i3ie (x) 
is a monomial which does not depend on s or s), and regular codes are quasi-regular. Since 
linear codes are regular, they are also quasi-regular. In the previous example of trellis 
coded 8-PSK, the two polynomials P 3i 5,011(1) and P 3 , 3 ,n\(x) are not monomials, and the 
code cannot be regular. However, it is quasi-regular because P 3l3 , e {x ) does not depend on 
(s,s). 

Let V be the set of signal selectors generated by the underlying binary code and y(v) 
be the signal in S selected by some v e V. To each signal selector error e corresponds a 
unique set of distances {d[?/(v),?/(v©e)],ueV r }. We define w(e ) as the minimum distance of 
this set. Originally, Ungerboeck [10] computed the free distance of his codes by assuming 
that the all-zero information sequence was sent and replacing d[y(v),y(v © e)] with a lower 
bound on w(e) in the computation. In a more recent publication, the free distance was 
computed directly from the values of w(e) [11]. Similarly, it will be shown later that the 
distance spectrum of quasi-regular codes can be computed from the all-zero path by using 
the u>(e)’s. 
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The set of signal selector error vectors e for which there exists a v such that d[y(v), y(v® 
e)] > w(e ) is denoted E. For example, for the mapping in Figure 3, E = {Oil, 111), which 
means that the distance between signals whose selectors differ by Oil or 111 is not unique. 

A distance spectrum contains all the distances between codewords, even infinite dis- 
tances between codewords that never remerge. In order to avoid dealing with these infinite 
distances, we consider only paths of any length l for any finite l > 0. The distance spectrum 
SP( l \x) at depth 1 of a trellis code satisfies 

SP«(x)^£p(y)l> , ' ,y,:i ’ 1 . (5) 

y y 

where y = {y x , . . . , t/;} is a correct path of length /, y = {yi,...,yi} is an incorrect path 
diverging from y at time 1 such that (y,y) is a first event error of length /, and p(y) is the 
probability of the correct path y. SP( l \ x) represents all possible distances between paths 
of length l which leave the same state at time 1. The distance spectrum is entirely defined 
by SP^( x) for all l. 

The worst case distance spectrum of a trellis code is derived from the distance spectrum 
of the code by replacing the distance d[y(v),y(v)| by w(v ® v) for all pairs of signals 
(?/(v), 2 /(v).) . Two paths have the same worst case distance if they have the same distance 
calculated using the tu(e)’s and if they have the same number of occurrences of e for 
each e t E. This means that the worst case distance spectrum contains some topological 
information about paths. 

.. Most of the best known trellis codes consist of a binary linear convolutional encoder 
followed by a mapper. In this case, the worst case distance spectrum has a simple expres- 
sion. 

Lemma 1: The worst case distance spectrum at depth / of a trellis code which consists of 
a binary linear convolutional encoder followed by a mapper satisfies: 

spi/’w = £ n 

e^O t=i 

where the sum is over all nonzero signal selector error sequences e = {e l5 . . . , ej of length 
l generated by the underlying code and r represents a time index. 

Proof: Since the underlying code is linear, for any correct path y, the set of error events 
(y,y) can be described by the nonzero codewords of the code. The Lemma follows as 
a simple consequence of the definitions of the distance spectrum and of the worst case 
distance spectrum. 
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The next two Lemmas are also immediate consequences of the definitions of the distance 
spectrum and of the worst case distance spectrum. 

Lemma 2: The worst case distance spectrum of a trellis code which consists of a binary 
linear convolutional encoder followed by a mapper can be computed by assuming that the 
correct codeword corresponds to the all-zero information sequence. 

Proof: The set of selector error sequences is exactly the set of binary codewords of the 
underlying code. Thus, the worst case distance spectrum depends only on the underlying 
code and on the distance function u>(.). The distance function w(.) depends only on the 
signal set and the mapping and not on the regularity of the code. Since the codewords of a 
linear code can be calculated by assuming that the all- zero information sequence was sent, 
the worst case distance distance spectrum of any trellis code generated by a binary linear 
convolutional encoder can be computed by assuming that the all-zero information sequence 
was sent. 

Lemma 3: The worst case distance spectrum and the distance spectrum of a regular code 
which consists of a binary linear convolutional encoder followed by a mapper are equal. 

Proof: Given a correct path y, to each incorrect path y corresponds a unique nonzero error 
sequence e, which is a codeword since the sum of “any two codewords in a linear code is a 
codeword. Therefore, the sum over y in (5) can be replaced with a sum over e ^ 0. Since 
the code is regular, d(?/ T ,y T ) = w(e T ), where e T ..is the..binary sum of the signal selectors of 
y T and y T . Equation (5) can then be rewritten as - 

si*'\x) = £p(y)£n*” ,( " ) 
y e^O r=l 

= £ II z w( *' ) I>(y) 

e ? ‘0r=i y 

= £ n w 

e^O r=i 

since the sum over e ^0 does not depend on the correct path y. Equation (6) is exactly 
the worst case distance spectrum of the code, which proves the Lemma. 

The next Theorem is the backbone of the algorithm for computing the distance spectrum 
of trellis codes. The proof of the Theorem consists of expressing the distance spectrum of a 
quasi-regular code as a product of the polynomials P e (x) defined above. Then this product 
can be computed from the worst case distance spectrum, provided that the underlying code 
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is linear. Since the worst case distance spectrum of quasi-regular codes can be computed 
by assuming that the all-zero information sequence was sent, the distance spectrum can 
also be computed from the all-zero information sequence. The idea of the proof is to group 
together all the error events that correspond to the same sequence of signal selector errors. 
The proof is by induction on the length of this error sequence. For each selector error 
in a sequence, the distance can be computed using the polynomial P e (x), provided that 
the error e is the sum of two signal selectors that leave the correct and incorrect states, 
respectively. 

Theorem 1: The distance spectrum of a quasi-regular code can be computed from its 
worst case distance spectrum. 

Proof: The proof of Theorem 1 is given in Appendix A. 

The proof of Theorem 1 shows that to compute SP^ l \x) it is sufficient to replace x w ^ 
in SP<P(x) with P e (x) whenever e e E, which can be done by knowing the number of 
occurrences of each e e E for each incorrect path. Theorem 1 is similar to Zehavi and 
Wolf’s first Theorem which states that the distance spectrum of Ungerboeck’s codes can 
be computed from a state diagram with 2" states, where is is the constraint length of the 
code [14]. The proof given in their paper is simpler than the proof given above because it 
applies to a smaller class of codes. 

The main advantage of our approach is that it allows the computation of the per- 
formance of trellis codes with significant constraint lengths, whereas Zehavi and Wolf’s 
approach requires the computation of a modified transfer function and is limited only to 
very small constraint lengths. The algorithm described in the next section computes the 
worst case distance spectrum of quasi-regular codes by assuming that the all-zero infor- 
mation sequence was sent. Hence, the distance spectrum of quasi-regular codes can be 
computed by assuming that the all-zero information sequence was sent, although unlike 
regular codes, all correct paths do not give the same distribution of distances. 

Lemma 4: Ungerboeck rate k/(k + 1) systematic codes are quasi-regular. 

Proof: Because of the rate of the codes, half the signals in the signal set leave each state 
in the trellis. Two cases can occur: either s and s are labeled with the same half of the 
signal set, or they are labeled with different halves. If (s l5 J a ) and (52)52) correspond to the 
same case, then P 3li s l}e (x) = P S7} ^ <e (x). If the two pairs correspond to different cases, then 
one of the two polynomials is not defined because distinct cases correspond to distinct sets 
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of signal selector errors (distinct least significant bits in e). This proves the Lemma. 

Ungerboeck rate k/(k + 1) systematic codes are quasi-regular, which allows fast distance 
computation and code search algorithms [10]. Quasi-regularity does not require that the 
set of distances of paths from a correct path is the same for all correct paths. It only 
requires that the distance spectrum is calculable from the set of distances from the all-zero 
path. 

3 The Algorithm 

The algorithm is a modified version of Chevillat’s stack algorithm for calculating the dis- 
tance profile of convolutional codes [4]. Bahl and Jelinek [1] and later Larsen [7] used 
a bidirectional algorithm for calculating the free distance of convolutional codes which 
extends paths forward and backward simultaneously. Our algorithm is also bidirectional. 

These previous algorithms terminate when the free distance is reached, whereas our 
algorithm continues to compute the higher distance spectral lines. It keeps track of the 
number of paths with the same distance and of the total information sequence weight along 
these paths. All the paths that reach a given state are retained, whether they have different 
distances or not. (In conventional free distance computation algorithms, only the path with 
the smallest distance is retained.) When a merger is detected, two cases can occur: (1) if 
no previous merging paths had a distance equal to that of the new merger, a new spectral 
line is created, or (2) if the distance of the merging paths is equal to the distance of an 
existing spectral line, the multiplicity of the line is incremented. Naturally, this algorithm 
requires more computation and storage than conventional free distance algorithms because 
no paths are discarded. 

The complexity of the stack decoding algorithm depends on the number of paths that 
must be extended and not on the constraint length of the code. The multiplicity of a 
spectral line of distance d is achieved when all remaining paths have a distance larger 
than d. In most codes, the longest free distance paths are several constraint lengths long. 
Similarly, it can be conjectured that the longest paths that have a certain distance are 
several constraint lengths longer than the shortest paths with the same distance. 

For example, for a code whose shortest register length is 3, the shortest mergers are 4 
branches long. The longest error event with the free distance will be about 20 branches 
long.. If the code is a rate 2/3 code, the number of paths of length 20 is 4 20 ~ 10 12 . 
Computing the free distance consists in finding the path among these 10 12 paths that has 
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the smallest distance. Computing the multiplicity of the free distance consists in calculating 
the number of paths whose distance equals the free distance, and so on for each spectral 
line. Obviously, this can can be an enormous task, even for simple codes. 

A forward path leaves the all-zero state on a non-zero branch in the forward direction, 
and once it has remerged with the all-zero path it is discarded. A backward path leaves the 
all-zero state on a non-zero branch in the backward direction, i.e., it consists of a succession 
of branches that lead to the all-zero state. 

A path is determined by the following information: 



Direction (fw or bw) 

s 

Terminal state 

d 

Distance 

l 

Length 

IF 

Array of occurrences of e e E 

A 3t d t e,W 

Multiplicity 

B s ,d,e,W 

Information weight 


The terminal state s is the last state reached by a path of length l branches from the 
all-zero state for a forward path and to the all-zero state for a backward path. The distance 
d is the worst case distance along that path, i.e., the distance calculated from the u>(e)’s. 
The array W of occurrences of e e E is the number of branches on the incorrect path for 
which the signal selector was e for each e e E, because the signal selector on the correct 
path is always zero. Two paths are identical iff they have the same structure, i.e., the same, 
distance d, length /, and array W . The multiplicity A St d, e ,w is the number of identical paths 
ending in state s with distance d, length /, and array W . The information weight B Si d, e ,w 
is the average of the information weights of all the identical paths (although the paths 
are identical, since they correspond to different topological paths, they may have different 
information weights). 

In the stack algorithm, an ordered stack of previously examined paths with different 
parameters is stored. For the bidirectional algorithm, two stacks are necessary, one for each 
direction. Each stack entry contains a path with all its information. Paths are ordered by 
decreasing distances and lengths. Paths with the same metrics and lengths but distinct 
terminal states or arrays W are stored on a last in, first out basis within the stack. This 
does not affect the efficiency of the algorithm, but accelerates the searching and sorting 
in the stack. The top path in the stack is the most ..likely to give the free distance or the 
shortest merging distance among all the paths in the stack, which is why it is extended 
first. 
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Each complete sequence of steps consists of extending the top path in the stack by 
computing its 2 fc successors. The terminal state cannot be the all-zero state because it must 
reach the all-zero state through the terminal state of a path from the opposite direction. 
If one of the successors reaches the all-zero state directly, it is discarded. Assume, without 
loss of generality, that a forward path is extended. Then three situations may . occur: 

(i) The state is not a terminal state for any forward path. Then a new entry is created 
in the stack of forward paths to store the new path and its parameters. 

(ii) The state is the terminal state of one or more forward paths. Compare the distance, 
length, and W of the two or more paths. If the new path is not identical to any old 
path create a new entry. If it is identical to an old path, increment the multiplicity and 
information weight of that path. 

(iii) The state is the terminal state of one or more backward paths. The path is then 
merged with these backward paths to form one or more error events. 

A forward path can reach the all-zero state only through the terminal state of a backward 
path, because all the branches that leave a state in the middle of a backward path have been 
extended. Thus, it is impossible to reach this middle state without following an extended 
branch. 

The spectral lines are stored by decreasing distance. Every time (iii) occurs, an old 
spectral line is incremented or a new one is created. The distance of an error event is the 
sum of the forward and backward distances, the lengthjs the sum of the lengths, W is the 
sum of the IT’s, the multiplicity is the product of the multiplicities, and the information 
weight is Bj w + Bi, w , where Bf w and B}, w are the forward and backward average information 
weights. 

There is no specific time for the algorithm to terminate. It depends on the number 
of spectral lines required by the user, or on specified maximum path lengths or distances. 
Once the algorithm is terminated, the worst case distance spectrum is converted into the 
distance spectrum of the code, and each worst case spectral line is expanded into many 
new spectral lines [9]. 

Each worst case spectral line (d, Ad, W) is expanded by replacing all possible combina- 
tions of the w(e)’s for which e e E by the other possible distances between signals whose 
selectors differ by e. For example, for S-PSK, E has two elements, Oil and 111, and if the 
IF” of a particular worst case spectral line contains the two occurrences 1 and 3, it means 
that e = Oil occurred once, and e = 111 occurred three times along each path represented 
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by this spectral line. Since each element in E corresponds to two possible distances (for 
8-PSK), there are 16 ways of combining these distances on the four (1 + 3) branches that 
correspond to e e E. This means that this particular worst case spectral line can be broken 
into 16 lines. The distance of each line is found by replacing w{e) by the other possible 
distances for all e e E. 

The probability of a distance that corresponds to a specific e e E is given by the coeffi- . 
cient of that distance in P e (x). For example, if P e (a r) = 1/2X 50 + 1/22 5 *, the probabilities 
of S 0 and S 2 are both 1/2. For each one of the 16 lines in the above example, there are 
four distances that correspond to a signal selector from E. The probability of having a 
specific set of four distances is the product of the probabilities of the individual distances 
computed from the corresponding P e (x). Then the average multiplicity of each new line 
among the 16 lines is the product of A 4 and the probability of each of the four distances. 

If the probabilities of the various distances are equal for all ee£, the 16 lines will have 
the same average multiplicity. 

The algorithm 

Step 1. Load the stack with the origin node, with distance zero, length 0, and multiplicity 1. 

All the other parameters are set to zero. Enter the number N of desired spectral 

lines. .... •* 

Step 2. Compute the metric, length, multiplicity, information weight, and VF’s of the 

successors of the top path in the stack. 

Step 3. Delete the top path from the stack. 

Step 4- For each successor, check if it merges and update the merger information [(iii)]. 

Step 5. Insert the new paths in the stack, and rearrange the stack in order of decreasing 
distance and length [(i) or (ii)]. 

Step 6. Output all the new spectral lines whose distance is smaller than the sum of the 

minimum forward and backward distances. If less than N spectral lines have been 
found, then change direction and repeat steps 1 to 6; otherwise, stop. 

Figure 4 shows the first four forward and backward steps. The bold line represents the 
correct path (the all-zero path). The trellis code has four states. Step la: one forward 
path is extended (no remerging because state s 2 has not yet been reached). Note that the 
all-zero branch is not extented. Step lb: one backward path is extended (no remerging 
because state s 3 has not yet been reached). Again, note that the all-zero branch is not 
extended. Step 2a: the top forward path which terminates at Si is extended to s 2 and s 3 . 
One successor terminates at s 2 , which has been reached by a backward path, so a merger 
is found (dashed line). Step 2b: the backward path that terminates at s 2 is extended to 
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reach Sx and s 3 (one merger through S3). Step 3a: the forward path that terminates at s 3 is 
extended to reach s 2 and s 3 (one merger through s 3 ). Step 3b: from s l5 only one backward 
branch is extended because paths are not allowed to reach the all-zero state directly. A path 
must remerge with the all-zero path through a terminal state from the opposite direction. 
This ensures that mergers are not counted several times. Since s 2 terminates two forward 
paths, two mergers are found. Step 4a: again only one branch is extended because the 
other one reaches the all- zero state directly (no mergers because si is not the terminal 
state of any backward path). Step 4b: the backward path that terminates at s 3 is extended 
to reach s x and s 3 (one merger is found per new state). 

The beginning of the distance spectrum can be used to upper bound the first error event 
probability and the bit error probability of trellis codes. We calculate these performance 
bounds for the some of the best known codes [10]. Figure 5 shows coded S-PSK for con- 
straint lengths 4, 6, 8, and 10 compared to uncoded QPSK. Figure 6 shows the distance 
spectrum of 16 state coded 8-PSK. The free distance of this code is 5.13, and the multiplic- 
ity of the free distance is 2.25, i.e., on the average, 2.25 incorrect paths are distance 5.13 
from "the correct path. The next spectral line ‘is relatively far from the free distance, and 
its multiplicity is still moderate. This means that for high enough SNR, the free distance 
is a relatively accurate indication of the performance of this particular code. The larger 
spectral lines are less spread out than the smaller spectral lines. This is a common property 
of trellis codes, as opposed to convolutional codes where the spectral lines are separated by 
integer distances. Note that the multiplicities of the large spectral lines are large. 

We have also noted that the distance spectrum of regular codes is usually denser than 
the distance spectrum of non- regular codes for the smaller spectral lines. This is because 
the free distance and most of the smaller spectral lines in a non-regular code are exceptional 
events and do not occur for all correct paths. Figure 7 shows a comparison of the simulation 
results performed by Ungerboeck [10] with the performance bounds computed from the 
distance spectrum. The practical ranges where these codes may be used are below error 
probabilities of 10 -5 . For such low probabilities, simulations become difficult, and the 
distance spectrum bound is a good alternative. It is much tighter than estimates based on 
the free distance alone. 
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4 Conclusion 


A class of non-regular codes, called quasi-regular codes, was defined whose distance spec- 
trum can be calculated by assuming that the all-zero information sequence was sent. Con- 
volutional codes and regular codes are both quasi-regular, as well as most of the best known 
trellis codes. An algorithm to compute the distance spectrum of quasi-regular trellis codes 
was presented. Tight performance estimates can be calculated from the first few spectral 
lines of most of the best known codes, and several examples show that this is an attractive 
alternative to simulations. 
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Appendix A 


A Proof of Theorem 1 


It is sufficient to prove that the distance spectrum at any depth l of a quasi-regular code can 
be computed from its worst case distance spectrum at depth l, since the distance spectrum 
can be computed from the distance spectra for all /. The distance spectrum at depth l 
satisfies 

spv\x) = E?(y)E IT (a- 1 ) 

y y T =1 

where (y,y) represents the l branches of a first event error of length /. A path y of length 
l defines a unique state sequence s — {sq, Si, . . . , s/} of / + 1 states (note that two different 
paths may have the same state sequence if the trellis has parallel transitions). Let s;_i and 
be the states of a correct and incorrect path, respectively. The paths y whose 
state is the specific state s/_ x are denoted by y|s/_i. In (A.l), the correct paths that reach 
the same state x are grouped together and the incorrect paths that reach the same state 
i;_ i are grouped together. Then, 


spv\x)='£p(s l - OE E p(yM'-i) E n •***>. 

®(-i si-iybi-!- - ybi_i ^=1 

Let y/_ x denote the first / — 1 branches of y. Then a path y can be broken into y /_ x 
and yi. Given a state s/_i, the probability p(y|s;_i) of a path y of length l is p(y|s;_i) = 
p{yi-i\si-i)p(yi\si-i), i.e., the probability of the path is the product of the probability of 
reaching state s/_ x times the probability of the last branch of y (which leaves state s/_ x ). 
Therefore 

sp {1 \x) = J2 p(yi-i\ s i-i ) J2 p(yi\ s ‘-i) £ n 

S| -‘ 5 i-i yi-ibi-i yi-i|«i-i i/ibi-i T= i 

= E*m)E E K»ki) E I ' < " ) E p(yi-ik-i) E n 

3i -‘ ^i-i yiUt-i vibi-i yi-iki-i yi-i|s[-i T=1 

= Ep( s >-.) E E E 

s l-l i(-i ej |s 1 _ 1 ,3i_i v|si_! 

E p(yi-.ki-i) E S **"*'. (a. 2) 

yi-ibt-i y(-i|si-i T=1 
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where e/ |5/_ x , J/_ x is one of the signal selector errors between branches that leave and 
The fourth sum can be replaced with the polynomial P 4i j ie ,(x) so that (A. 2) becomes 

sp {1 \x) = Y2p( 3 i-i)Y 1 H p s,sA x ) 2 p(yi-i\ s i-i) H n i%,Jr) - 

s I-l *1-1 edii-x.si-i y i— 1 |sj_i yi-l 1*1-1 T=1 

Because the code is quasi-regular, P s ,s iei (x) does not depend on s and i, provided that e/ 
is a signal selector error between two branches that leave s and s, respectively (otherwise 
the polynomial is not defined). This allows us to switch the summations over the states 
sj_i and sj_i with the summation over ej, so that 

sp (1) (x) = Y,Ps,iA x ) p ( s ‘- 0 2 p(y/-il 5 f-i) 12 tl xd{ ™\ 

e ‘ (*i_ii*i-il e i) yi-i l*i-i yi-i l*i-i T=1 

where (s;_i, s;_i|e/) is any pair of states generating signal selectors that differ by e;. All the 
first event errors of length l that reach states s^ l and s/_! and differ by e/ can be grouped 
together. These error events are denoted by (y,y)|e/ in the next equation. Also P a j <et (x) 
can be written as P ei (x), since it does not depend on s and s. Therefore 

sp {1) (x) = 5^p ei (x) J2 p(y/-i) n xd{VT,iT) - (a. 3) 

e > (y,y)h r=i 

From (A. 3) and the definition of the distance spectrum at depth l, 

X>( yi)U xd{yr '* r) = T, p ei( x ) E p(y*-i) ff xd(yr ' ir) - (a. 4) 

(y,y) T=l e < (y.y)lei . T=1 

The same procedure can be repeated with the right side of (A. 4) to express the distance 
spectrum at depth l as a function of a sum over error events (y, y) conditioned on two . 
consecutive time intervals (i.e. , conditioned on e; and e/.j). Furthermore, this result can 
be extended by induction on the number of signal selector errors on which the beginning 
of the error event is conditioned to obtain for any 1 < k < l — 1 

E ■ p(y>)n*' % '* , = £p<,,(*) E j>(y.-0 ff (a.5) 

(y.y)h.-.e.<+i T=1 e * (y,y)|ei,-,e« T=1 

where (y, y)|e<, . . . , e K+ i is a first event error of length l whose (« + 1)^ . . . branches are 
labeled with signals whose selectors differ by ej, ...,e K+1 . In particular, for k = 1, (A.5) 
becomes 

E ?(y.) n = E (*)■ ' (a.6) 

(y,y)|ei,-,e 2 T=1 e ! 


IS 


Hence from (A.3), (A.4), (A.5), and (A.6), we obtain 

5P'"(x) = £ n ftrW. 

e?£0 t =1 

where the summation is over all nonzero signal selector error sequences e of length /. Note 
that when e T does not belong to the set E, then P er (x ) = x w ^ T \ Assuming that SP^\x) 
can be computed, then the distance spectrum can be calculated from SP^(x) by replacing 
x w ( er ) with P er (x ) for each e T e E. This concludes the proof. 
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Figure 1: Schematic representation of a typical trellis code. 
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Figure 2: A first event error. 
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Figure 4: Eight iterations of the algorithm. 
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Figure 5: Performance of coded 8-PSK. 
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Figure 6: Distance spectrum of 16-state coded 8-PSK, 
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abstract 

Achieving reliable digital communications over fading channels usually requires not only 
high signal energies, but also large bandwidth expansion factors, in particular for time 
diversity signaling. It is shown that bandwidth efficient data transmission using trellis 
coded modulation, introduced for the AWGN-channel, is also feasible on fading channels. 
The Chernoff bounding technique is used to obtain performance bounds for bandwidth 
efficient trellis codes on fading channels with various degrees of side information. New 
design criteria, the effective length and the minimum product distance, are introduced 
for trellis coded modulation on fading channels. Based on these design criteria, 8-PSK 
trellis codes for fading channels are constructed. The performance of the new trellis codes 
is analyzed for fading channels with different degrees of side information, and it is shown 
that the new codes have a significantly better error performance than codes of the same 
complexity designed for Gaussian channels. 


^This work was supported by NASA Grant NAG 5-557. 



1 Introduction 


In coding theory the most frequently assumed model for a transmission channel is the 
additive white Gaussian noise (AWGN) channel model. However for many communica- 
tion systems the AWGN-channel is a poor model, and one must resort to more precise 
and complicated models. One type of non-Gaussian model which frequently occurs in 
practice is the fading channel. An example of such a fading channel is the mobile satel- 
lite communication channel, which has been the subject of several recent articles [1-5]. 
Mobile satellite communication systems are usually used at low data rates. With linear 
predictive coding (LPC)), digital voice transmission is possible at 2400 bits/s, and it is 
envisioned that mobile satellite channels can be used up to that data rate. 

Fading is caused if the receiving antennas, like the very small antennas used in mobile 
radio links, pick up multipath reflections. This will cause the channels to exhibit a time 
varying behavior in the received signal energy, which is called fading. While there are 
other degradations like time varying dispersion, we will concentrate on the most basic 
model. We consider double sideband amplitude modulation DSB-AM and our receiver uses 
a DSB demodulator. The transmission channel that arises from such a system is shown in 
Figure 1. Fading comes about when the communication path is littered with “scattering 
particles”. If the number of scatterers is large, the received signals I(t) and Q(t) will 
be statistically independent Gaussian- processes [2] [6], which translate into statistically 
independent Gaussian random variables z,- and z q in signal space. If there is only a diffuse 
multipath signal, the mean values of Zj-and z q - are zero and the amplitude of the signal 
vector hy/E m = yjz 2 + z^ is Rayleigh distributed with 

p(6)=2i.e-‘t (1) 

where E[b 2 E m ] — E[b 2 ]E m — E m is the average energy received via the diffuse multipaths. 
If there is also a direct line of sight signal, with received signal energy Ed , the amplitude 
of the total signal is Rician distributed: 

p(b) = 26(1 + a>-*- 62 ( 1+ *)/ 0 (2b^/K(l + K )) , (2) 

where K = Ed/ E m is the ratio of the signal energy received on the direct path to the 
signal energy received via the diffuse multipaths, and / 0 (*) is the first order modified 
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spherical Bessel function. Note that (2) reduces to (1) when K = 0. We further define 
the total energy of the received signal E s = Ed+E m . In the Rayleigh case E s = £ m , since 
the direct component is zero. With the use of trellis coded modulation, we rely on the 
feasibility of coherent reception and we assume that the carrier phase can be recovered. 

We now discuss the following communication system. The transmitter sends a se- 
quence of 2-dimensional signals x = over the fading channel, where each 

signal x T is chosen from some signal set A = {a-[, . . . ,a A ] of cardinality A. This signal 
is represented by two analog waveform signals I(t ) and Q(t) which modulate the car- 
rier signal y/2 cos u 0 t and its quadrature component y/2 sin u> 0 t. This modulation process 
translates the signals into the frequency band with center frequency u>o- This bandpass 
signal is then transmitted over a bandpass channel with both Gaussian noise and fading. 
At the receiver the received waveform signal is demodulated into the baseband direct and 
quadrature components I(t) and Q(t), which are transformed by the baseband receiver 
into the sequence of 2-dimensional received signals y = (y , ■ ■ ■ ,y_ [ ). Each y_ r is a distorted 
copy of the transmitted signal z r , i.e., 

y r = Kx r + n r , (3) 

where b r is the multiplicative distortion introduced by the fading, whose density function 
is given in (2), and n r is a 2-dimensional Gaussian random variable with variance N 0 . For 
mobile communications the fading usually varies slowly compared to the signal intervals, 
and therefore we assume that b r is constant throughout each time interval. The amplitude 
b r is called the fading depth at. time r. Practical examples of such Rician fading channels 
are discussed by Hagenauer et. al. in [2] and. [3]. 



>/2sinwo t Gaussian noise -v/Ssinwo^ 


Figure 1: DSB transmission channel model used on the fading channel. 
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For binary orthogonal signaling on a Rayleigh fading channel, it can be shown that 
the error probability is given by [6, page 533] 


Ph = 


2 + 


No 


(4) 


where Es is the mean value of the received signal energy and Es/N 0 is the channel symbol 
signal-to -noise ratio (SNR). In contrast to the AWGN-channel, the error probability on 
a single transmission decreases only inversely with Es/Nq. In order to reduce the error 
probability on a Rayleigh/Rician channel, one must get around the high probability of a 
deep fade on a single transmission. A popular method to achieve this is diversity trans- 
mission. One form of diversity transmission, time diversity, involves sending a symbol L 
times, where the receiver performs some averaging to achieve an error performance that 
decreases exponentially with the SNR, i.e., 


Pb < e 


- 0 . 149 ^ 


(5) 


For a more detailed discussion on diversity signaling the reader is referred to [6], [7], and 
[8]. Retransmitting the same signal L times involves bandwidth expansion by a factor 
L, which is not tolerable in bandwidth limited environments. Another method is binary 
coding, which can yield an arbitrarily low error probability, but also at the expense 
of bandwidth. In this paper, we focus on bandwidth efficient trellis coded modulation 
(TCM) as a means of achieving reliable digital communications over fading channels 
without bandwidth expansion. Some of our results have been derived independently by 
Divsalar and Simon in a paper [9] that focuses on a discussion of multiple trellis coded 
modulation for fading channels. 


2 Performance Bounds 

2.1 Chernoff Factors 

In this section we present a general method for deriving error performance bounds for 
coded systems used on memoryless channels 2 . We will apply these results to TCM com- 
munication systems whose structure is shown in Figure 2. A TCM communication sys- 
tem consists of a trellis encoder, a signal interleaver, the transmission channel, a signal 
2 This method can be extended to also include finite state channels. 
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deinterleaver, and a trellis decoder. A rate R = k/n trellis code is generated by a bi- 
nary convolutional encoder followed by a mapper. The convolutional encoder is a finite 
state automaton with 2 V possible states, where v is the memory order of the encoder. 
At each time interval r, the encoder accepts k binary input bits (u k , u k_1 , . . . , uj) and 
makes a transition from its state S r at time r to one of 2 k possible successor states S r+1 . 
The h = n — (k — k) output bits of the convolutional encoder and the k — k uncoded 
information bits (u k , . . . , u k+1 ) form one of 2 n binary n-tuples v T = (u", v” -1 , . . . , v}), 
which is translated by the mapper into one of A = 2 n channel signals from a signal 
set A = {o.i, a 2 , . . . ,ua}- The uncoded information bits do not affect the state of the 
convolutional encoder and cause 2 k ~ k parallel transitions between the encoder states S r 
and 5 r+ i. Since the coherent DSB-AM system transmits two dimensions in one analog 
waveform signal, it is sensible to design the trellis encoder for 2-dimensional signal sets. 
A rate R = k/n trellis code transmits k bits/channel signal, where the channel signal set 
contains A = 2 n signals. If such a TCM communication system replaces an uncoded sys- 
tem that uses a signal set with A' = 2 k signals, the overall transmission rate is preserved, 
and we call such a TCM system bandwidth efficient. 



Figure 2: Trellis coded communication system. 
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The particular transmission channel discussed in this paper is the Rayleigh/Rician 
fading channel introduced in section 1. The interleaver/deinterleaver converts the chan- 
nel to a memoryless channel and insures that the signals in the received sequence axe 
independent. They axe decoded by a maximum likelihood sequence estimator (usually 
using the Viterbi algorithm). The Viterbi algorithm finds the signal sequence that most 
closely corresponds to the sequence of received signals. It achieves this by calculating a 
decoding metric m(x, y) between x and y, where x = (x 1} . . . , xj) is a possible sequence of 
2-dimensional transmitted signals and y = (]/ , . . . , y ; ) is the received sequence. m(x,y) 
is some non- negative function of x given y, which is inversely related to the conditional 
probability that x was transmitted if y was received. The decoder will then choose the 
message sequence x for which this metric is minimized. It makes an error if it decodes 
a sequence x', given that the correct sequence, i.e., the transmitted sequence, was x. 
This will happen if m(x', y) < m(x, y). In the case of channel state side information the 
decoder will use m(x, b,y) as its metric, where b is the side information obtained from 
the channel. 

The two code word error probability, i.e., the probability that x' is erroneously decoded 
if x is sent is given by 

P(x — ► x 7 ) = Pr{m(x / ,y) — m(x,y) < 0}. (6) 

We use the Chemoff bounding technique [7] to upper bound the above expression and 
obtain 

Pr{m(x',y) -m(x,y) < 0} <E* [exp(— A{m(x',y) -m(x,y)})], (7) 

where E denotes conditional expectation and A is a non-negative real valued parame- 

y|x 

ter over which we minimize the right hand side of (7) to obtain the tightest possible 
exponential bound, i.e., 

P(x -+ x') < min E [exp(— A{m(x', y) — m(x, y)})] 

A y|x 

= minC(x,x', A), (8) 

where C(x, x', A) is called the Chemoff bound between the signal sequences x and x'. 

Restricting attention to decoders using additive metrics, i.e., 
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( 9 ) 


i 

m ( x ? y) = 

r=l 

we may rewrite (8) as 

P(x — ♦ x') < minC(x, x', A) 

l 

= min n [exp(-A{m(4,y r ) -m(x r ,y r )})] 
i 

= min JJ C(x r ,x^, A), (10) 

A r=l 

where C(x r ,^(., A) is called the Cher no ff factor of the signals x r and xf. The two code 
word error probability bound is now given by 

l 

P(x — *■ x') < min C(x r , A). (11) 

A r=l 

The Chernoff factors are important because they not only streamline the expression for 
the two code word error probability but also apply to the transfer function bound for 
trellis codes introduced later and to the cutoff rate calculations. In particular, it can be 
shown [10] that R 0 , the channel cutoff-rate in bits/transmitted signal, is given by 

A A 

Rq(p) = -log 2 min £ '^2p(a m )p(a p )C(a m , a p , A), (12) 

A m=lp=l 

where p(af) is the probability of choosing the signal a ; £ A. Note that R 0 is dependent 
on the particular metric m(y’ r ,r. r ) that is used by the decoder. If the decoder uses the 
maximum likelihood (ML)-metric for a memoryless channel, i.e., 

i 

m( y, x) — log(Pr(y|x)) = - log JJ Pr (y r \x r ) 

r= 1 

= J2 (- lo g( Pr (y r kr))) = J2 m (^y T ), (13) 

r= 1 r=l 

(12) becomes the channel cutoff-rate for the optimum receiver, which is the usual defi- 
nition of Ro [6]. We will denote the value of A which maximizes the cutoff-rate (12) by 
A Rq. In this case, the Chernoff factors will be written as C(a m ,a p ) = C(a m ,a p , Xrq)- 
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2.2 Fading Channel with Side Information 

In this section we calculate the Chemoff factors under the assumption that the decoder is 
furnished with perfect side information, i.e., at each symbol interval r the fading depth b r 
is known. We choose our decoding metric as m(?/ r , b r ,x r ) = \y_ r — b r x r | 2 , which would be 
the maximum likelihood metric on an AWGN-channel if the transmitter sent the signal 
b r x r during time interval r. Indeed, the decoder will not know whether the transmitter 
modulated the signal amplitude by multiplying it by the constant b T or whether the 
channel distorted the symbols in that way. Due to the additional stochastic process b, 
the two code word error probability becomes 


P(x 


x') = E 

' b 


E (Pr{m(x',b,y) 

y|x 


m ( x ? b, y) < 0}] , 


(14) 


where b = (&!,...,&/) is the sequence of fading depths. The above probability can be 
expressed in terms of the Chernoff factors, i.e., 


P(x — » x') < minC(x, x', A) = mini? 


L n Z kr [exp (-A (|y r - bX\ 2 - | y_ r - 6 r x r | 2 ))] 


— min£J 

A b 


i 

r 

■r=l 


n ex P - a^) 2 (1 - AiV 0 )) 


( 15 ) 


The product inside the expectation is minimized by setting A = = l/(2iV 0 ) and the 

Chernoff factors become independent of A, i.e., 


and 



C{x T X) = E 

O r 


exp 




(16) 


(17) 


Due to the interleaving the fading depths b r are governed by independent identically 
distributed probability distributions and the subscript r on b in the above equation can 
be dropped. Then we have 
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(18) 


C(z,,£) = E 

0 


eXP (" 4 % (Xr " & )J = /„ P(i,)e ‘ 


where p(b) is the Rayleigh- Rician probability distribution given in (2). 
nipulations (18) becomes 


db, 

After some ma- 


ze 


C(x r ,xl) = 


1 + I< 

l + K + 

i -1- XV -t- 4No 


1 + K+ 


*N n 

Ur-4 

4AT 0 


(19) 


Figure 3 shows R 0 from (12) for the Rayleigh ( K = 0) fading channel with side 
information for several two dimensional signal constellations. In this case, Ro is 3-6dB 
smaller than on the AWGN channel [6], [10], but the relative performance of each signal 
constellation is preserved. It is worth noting from (5) that if diversity signaling is used 
with binary symbols on a Rayleigh fading channel, there is a loss of 5.25dB in SNR 
compared to the AWGN-channel with the same noise power spectral density JV 0 , at the 
expense of considerable bandwidth expansion. The R 0 - curves for the fading channel on 
the other hand assure that good error performance (i.e., comparable error probabilities to 
the AWGN-channel), without the bandwidth expansion introduced by diversity signaling, 
is possible, albeit at higher values of the SNR Es/No. 

2.3 _ Fading Channel Without Side Information 

With no information on the fading depth available, the decoder uses the maximum like- 
lihood metric for the AWGN-channel, i.e., m(x r ,n r ) = \y_ — x_ r \ 2 , and we obtain for the 
Chemoff bound 


r i 


P(x — ► x') < minC(x, x', A) = mini? 

x A b 


II $ [exp(— A(|y -j£| 2 ~ \y r — £ r | 2 )) 

. y Er L 

r=l ■ir u “ r 


= min jQ E exp(A(£ 2 — x?) — 2\b r (x% — x r x' r ) + A 2 N 0 (x r — ) 2 )] , (20) 


* r= 
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which gives us the following expression for the ChemofF factors for the fading channel 
without side information, 

C(xU r , A) = e A (^-^ 2 )+ A2 ^^-2 , r) 2 |[ e -2A6(^-x T x;)J _ (21) 

E\e~ 2X can be evaluated as follows: 

b J 

—K 

|[ e -2Ab(^-^)] = —■ £ (l - vWc (tf) (22) 

where 

o = ^=jf- - '/Kc°m ( 23 ) 

and erfc($) = 2/y/ir f£° e ~^ 2 d-y is the complementary error function. (22) must be evalu- 
ated numerically and substituted into (21) to yield the expression for the Chernoff factors 
of a Rayleigh/Rician fading channel with no side information available at the receiver. If 
we consider the limiting case of a Rayleigh channel with K — 0, the expectation in (22) 
can be evaluated in closed form to give 

£[ e -2A6(z?-r r x;)J _ x _ A /^erfc(A(^ - A(z? - xX)). (24) 

The ChemofF factors in (21) can then be used in (12) to calculate the cutofF-rate Ro for the 
fading channel with no side information. Figure 4 shows Rq curves for the fading channel 
without side information for several signal constellations. It is worth noting that, unlike 
the case of the AWGN-channel and the fading channel with side information, constant 
envelope schemes fare considerably better than rectangular constellations for the fading 
channel without side information. The reason for this superiority of constant envelope 
schemes is the fact that fading radially shrinks or expands the decision region boundaries. 
Because the decoder does not know the fading depth 6 r , it cannot adjust these boundaries. 
In the case of constant envelope signal constellations, the decision boundaries are radially 
symmetric and therefore independent of the fading depth [6, chapter 5]. 
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3 Coding Schemes 

3.1 8-PSK Trellis Codes 


A rate R — k/n trellis code is generated by a binary convolutional encoder followed by a 
mapper as discussed in section 2. Figure 5 shows a rate 2/3 convolutional encoder and a 
mapping to 8-PSK signals without parallel transitions, i.e., k = k. The trellis codes for 
fading channels discussed in this paper are based on systematic convolutional codes of 
rate 2/3. The output bits of the binary encoder are mapped into the set of 8-PSK signals. 
This coded system transmits 2 bits/modulation signal, maintaining the same data rate 
as uncoded QPSK modulation. 



o • o " o oo oo 'oo oo • o oo 

00000009 
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Figure 5: Rate 2/3 convolutional encoder with mapping from the binary output triples 
into 8-PSK signals. 

Using the delay operator D , we may express the binary input sequences Uq, u\,ul, . . . 
and tig, ti*, “ 2 , ... as polynomials in D, i.e., u 1 (T>) = Uq + u\D + u\D 2 + • • ■ and u 2 (D) = 

ul + u\D + u\D 2 -\ . Similarly, the encoder connections may be expressed as polynomials 

in D such that H°(D) = + H° X D + . . . + H° V D\ H\D) = H 2 + H\D + . . . + H\D\- 
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and H 2 (D ) = H% + H 2 D + . . . + H 2 D 1 ', where v is the encoder memory and H$ = 1. The 
encoding operation can then be expressed in matrix notation as 


[v\D),v\D),v°(D)] = [u\D\u\D)) 



H\D)/H\D ) ' 
H\D)/H°{D) ’ 


(25) 


where v°(D), v 1 (D), and v 2 (D) are the polynomials of the three binary output sequences 
entering the mapper and 7 2 is the 2x2 identity matrix. 

We will usually give the encoder polynomials H°(D), H 1 (D), and H 2 (D) in octal 
form, i.e., the code H\D) = 23 (10011), H\D) = 04 (00100), H‘ 2 (D) = 16(01110) means 
H°(D) = D 4 + D + 1, H\D) = D 2 , and H\D) — D 3 + D 2 + D. 

3.2 Transfer Function Bound 

In this section we develop the transfer function bound on the performance of trellis codes 
over fading channels. This will lead to a design criterion for good codes. Although the 
bit error probability Pj, is the quantity of ultimate interest, a closely related and more 
readily determined quantity, the event error probability P e , will be used to characterize 
the performance of trellis codes. 

If x and x' are two symbol sequences corresponding to two paths through the trellis 
which are distinct for l branches starting at node j, and the decoder chooses the encoded 
sequence x' over the correct sequence x, this is called an error event of length l starting 
at node j. An error event starts where the two paths diverge and ends where the two 
paths remerge. A union bound on P e for a trellis code may be obtained by summing the 
probabilities of the error events of all possible lengths given a particular correct sequence 
x and averaging this quantity over all possible correct sequences x. . 

With each incorrect path we may associate a sequence of incorrect trellis states S', 
while the sequence of correct states is denoted S T . Any error event of length / can then 
be described by l state pairs, (So, Sq), . . . , (Si, SI), with So = Sq, S/ = Sj, and S r ^ S' 
for 0 < r < /, i.e., the incorrect path must not touch the correct path during the error 
event. Associated with these paths are the two symbol sequences x = (xq,^, . . . ,Xj) and 
x' = (x^, x' x , . . . ,£j), where x r , € A. The probability of an error event may be upper 
bounded using the Chernoff factors (with A = A^,,) between the individual signals of the 
two code sequences which the paths generate. We may therefore write 
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(26) 


Pr[(S„, . ... S, , S,')] < C(x, x') = II 


r=l 


We now introduce the transfer function matrix T as the 2 2u x 2 2l/ matrix whose rows 
and columns axe labeled with state pairs ( S T , S') and whose elements at the intersection 
of row (Si, Sj) and column (St, S/) are given by 


tij,kl 


m C(x.ik,2Ljl) 

< 

0 


if the transition S,- — > Sjt 
or S'- — ► S,' does not exist, 


(27) 


where x lfc is the signal the encoder transmits when it changes from state S,- to Sj t and x! jt 
is the signal on the trellis branch connecting state S'- to S f '. represents the Chernoff 
factors averaged over a pair of branches corresponding to the state pairs (S,-, S k ) and 
(S'-, S/). Note that the sum over is over all parallel transitions leading from state S' to 
state S[, the sum over £_ lk is over all parallel transitions that lead the encoder form state 
S, to S;t, and m = 2 k ~ k is the number of parallel transitions. The factor 1/m reflects 
the fact that one of these transitions is selected with probability 1/m by the transmitter 
following the correct path. 

Rearranging the state pairs in the matrix T, we can write 


Tec T C i 
Tic T n ’ 


(28) 


where C is the set of state pairs such that S r = S', called a correct state pair, and I is the 
set of state pairs such that S r ^ S', called an incorrect state pair. The 2" x 2 V submatrix 
Tcc then contains all branch pairs in the trellis that diverge from a particular state 
and immediately remerge again, i.e., all the parallel transitions. Tci and Tic represent 
diverging and remerging branch pairs, respectively, while Tn represents all branch pairs 
that do not touch at either end. All error events start in a correct state pair and return 
to a correct state pair. An error event may thus occur in two distinct ways: 


• One step error events consist of immediate transitions from one correct state pair 
to another. These are the parallel transitions in the trellis and are given by the 
submatrix Tec- 
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• Error events extending over more than one branch axe given by state pair sequences 
starting in the subset C and returning to it via one or more visits to the subset J, 
i.e., in matrix notation, 


TciTic + TciTiiTic + TciTfjTjc + - - • 
or Tci [I + Tu + T/j + . . .] Tic (29) 

or Tciil-Tn^Trc, 

where I denotes the ( 2 2 " — 2") x (2 2l/ — 2") identity matrix. The event error probability 
may now be upper bounded by the transfer function bound, i.e., 

Pe < ^ 1 T {Tcc + Tci [I - T//]- 1 Trc} 1, (30) 

where 1 is the 2 t/ -dimensional all-one vector. The post multiplication by i represents the 
union of the error events from one state to any other state, while the premultiplication 
by 1 T sums all 2" starting states, each one of which is assumed to have probability 1 /2 </ . 
Equation (30) can be written in the following form: 

p e <I> t p t , ( 31 ) 

t 

where the sum is over all code sequence pairs x, x' whose two codeword error probability 
has a specific value of the Chemoff bound P t = FIr=i and A t is the average 

number of code sequence pairs with Chemoff bound Pt- 

3.3 Effective Length 

In order to gain insight into the problem, we will approximate the expression in (30). In 
the submatrix Tec , one term will usually be dominant. This is the nearest neighbor to 
the correct path among all the parallel transitions. Let this dominant term be Pc-, i.e., 

Pc = C(x T X ), (32) 

where the signal pair x r ,xj. is the one with the smallest value of the Chernoff factor. 
Similarly, the second term inside the bracket of (30) will have a" dominant term, which 
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we denote by P/. This is an error event of length / > 1, that extends over more than one 
branch. According to (26), P/ can be written as 

Pi = II C(x r ,£). (33) 

r=l 

We now look at two particular cases, the additive white Gaussian noise channel and 
the Rayleigh fading channel with side information. For the Gaussian channel 

(2- -£(■)* A 2 

Pc = C , (r r ,^ r ) = e =e 4W ° , (34) 

where the dominant signal pair (x_ r , is the one with the smallest value of (i, — x’ T ) 2 , 
i.e., the one with the smallest squared Euclidean distance, denoted by A 2 . Similarly Pj 
may be evaluated as 

P/ = PJ C(X r , X^) = e 4N 0 £r) _ e 4 N 0 ? (35) 

r=l 

where the dominant path pair is the one with the smallest cumulative squared Euclidean 
distance, denoted by d 2 , the minimum free squared Euclidean distance of the code. In 
the Gaussian case it is unimportant over how many branches this minimum free squared 
Euclidean distance is accumulated, i.e., the right hand side of (35) is independent of l. 

For the Rayleigh fading channel with K — 0, we obtain the following expression for 
Pc from (19): - _ 


Pc = C{x T X) = 


X -f- lgr.~£r f- (x r — xL) 2 ' 

L ^ 4/Vq V_r “ r/ ' 


where the approximation is tight for Es/Nq > QdB, which is usually the case for trans- 
mission over a fading channel. Hence in the Rayleigh fading case the dominant term 
for parallel transitions is also the one with the smallest squared Euclidean distance A 2 
between two signals. P/ on the other hand may be evaluated as 

Pi = YlC(^X) 


= n- 

r=l 1 + 


J w 


16 


where l' equals l, the length of the path pair, less the number of matches , i.e., the number 
of branches where x_ r = We call V the effective length of the path pair and l' m = min(/') 
the effective length of the code, where the minimum is taken over ail path pairs. The 
approximation is again tight for Es/N 0 > 6 dB. Equation (37) is dominated by the paths 
with the shortest effective length, i.e., I' = l' m , and among those by the one having the 
smallest product in the denominator, i.e., the one with the smallest squared product 
distance dj; = nl=i {S-r ~ &Y- Hence the total event error probability in the fading case 

can be approximated by 


P e 


4JV 0 (4iV 0 )'~ 

1 A ~ + h ~df~ 


(38) 


where ki is the average number of parallel transitions with the smallest squared Euclidean 
distance A 2 and k 2 is the average number of trellis path pairs with the smallest squared 
product distance d 2 . It is clear from (38) that parallel transitions are most harmful in the 
Rayleigh case and should be avoided, unless A 2 is considerably larger than d 2 and the 
code is used at a low SNR Es/N 0 . In that case it is questionable if the necessary synchro- 
nization can be maintained in order to decode any but the most simple constellations. In 
Table 1 we list the effective length ( l ' m ) and the minimum squared product distance (d 2 ), 
i.e., the smallest squared product distance of those path pairs with l' = l' m , for the set of 
rate 2/3 trellis codes designed by Ungerboeck [11], [12] for the AWGN-channel. 


i.d. 

V 

H°(D) 


H*(D) 

‘L 

<* 2 p 

g2 

2 

5 

2 

- 

1 

4 

g3 

3 

11 

02 

04 

2 

8 

g4 

4 

23 

04 

16 

3 

4.68 

g5 

5 

45 

16 

34 

2 

8 

gb 

6 

103 

030 

066 

3 

16 

g” 

7. 

277 

054 

122 

4 

2.75 

g8 

8 

435 

072 

130 

3 

16 

g9 

9 

1007 

164 
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3 

16 
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10 
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164 

770 

4 
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Table 1: Ungerboeck’s 8-PSK codes. 


These codes use an 8-PSK signal set and are among the best codes known for the AWGN- 
channel. However they suffer a significant performance degradation on the Rayleigh 
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fading channel due to their small l' m and the slow increase in V m with code complexity, 
which explains their poor performance discussed in section 4. 

Table 2 shows a list of 8-PSK codes designed for fading channels, i.e., designed for a 
large effective length l' m . The codes were found using either an exhaustive search or one 
of the construction methods presented in [13] and [14]. 


i.d. 

V 

H°(D) 

HHD) 

H*(D) 

I'm 

4 

f2 

2 

5 

2 

- 

1 

4 

f3 

3 

11 

02 

04 

2 

8 

f4 

4 

23 

04 

16 

3 

4.68 

£5 

5 

43 

14 

36 

3 

16 

fB 

6 

103 

036 

154 

4 

8 

£7 

7 

223 

076 

314 

4 

8 

f8 

8 

673 

336 

164 

5 

5.49 

£9 

9 

1413 

756 

244 

5 

18.75 

no 

10 

3303 

1676 

504 

5 

128 

fll 

11 

6403 

3436 

1264 

6 

10.98 


Table 2: Codes designed for fading channels. 


Note that the codes in Table 2 have, with the exception of the very short codes 
f2=g2, f3=g3, and f4=g4, a larger effective length l' m , which should give them superior 
performance on fading channels, especially at high values of Es/N o. The Gaussian codes 
designed for a large free squared Euclidean distance d 2 achieve this distance over only 
a few branches in most cases. This proves detrimental on fading channels, where the 
distance should be spread more evenly oyer all the branches of a trellis path pair. 

3.4 Binary Signaling 

Due to the synchronization problem, binary signaling is usually used on fading channels. 
Let us assume for the sake of discussion that the trellis is generated by a rate R = k/n 
binary convolutional encoder. Instead of the two dimensional multilevel/phase signals x T 
on the branches of the trellis in the case of bandwidth efficient coding, n binary signals 
are used. These signals are antipodal if coherent reception is possible (BPSK-signaling) 
and orthogonal otherwise. We also assume that the binary signals are interleaved to make 
the channel memoryless. The Chemoff factor for the two signals x r and a£ is then given 

by 
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( 39 ) 


«=i 

where x T , is the i-th bit on the r-th branch of x and the path error probability bound of 
(26) becomes 

Pr((So Si) - (SJ, . . . , s;)l < n n C(xh, x'rt) = fl C(X», x'»), (40) 

r=l i=l Jt=l 

where ifc and x*, 1 < k < nl, are the two binary signal sequences associated with the 
error event ( S 0 , S' 0 ), . . . , (Si, S' { ). Substituting the Chemoff factors for binary signals, we 
obtain 


Pr[(5 0 ,...,5,) 


nl 


(si,..., si) i < n 


k = 1 1 + 
nl 

n 


(xk-K ) 2 

4 N 0 

4AT 0 


k=i (*fc-*fc) 3 

x k* x k 


(41) 


Since (xk — x' k ) 2 equals 0 or A 2 , depending on whether = ij. or ij, ^ x' k , we can 
simplify (41) to obtain 


Pr[(5 0 , . . . , Si) 



nl 

n 

k= i 
x k* x 'k 


4 Np 
A 2 



d(xpc') 

? 


(42) 


where d(x, x') denotes the Hamming distance between the two binary sequences x and 
x' that make up the two paths in question. Equation (42) shows that the well known 
Hamming distance remains the design criterion for binary signaling on the fading channel. 
It is worth noting that a 3dB coding gain may be achieved if phase synchronization is 
possible, since the binary signals may then be chosen to be antipodal, i.e., A 2 = 4 Es, 
where Es is the signal energy. If phase synchronization is not possible, the signals must 
be orthogonal, giving A 2 = 2Es, reflecting a 3dB loss. For a more extensive discussion 
on binary signaling for the fading channel see [6] and [16]. 
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3.5 Rician Fading Channels 

The Rayleigh fading channel is the limiting case of a more general channel, the Rician 
fading channel. In the Rician case the expression for Pc is, from (19), 


K 


^r-l'r) 2 


Pc = C(x T X) = 


1 +K 


1 + K+ 


1 4. K + 

1 + ^ ^ 4N 0 


r 7 r a. 


(43) 


Pi on the other hand is given by 


AZr-lLY 


< - TLx — 

Pi=ncw=< 

r=l 


^ , 


1 +K 


1 4. (gr.. litl i 

r=l 1 t n. -f 4/Vo 


(44) 


Comparing (43) and (44), it becomes obvious that the performance degradation of the 
parallel transitions becomes less and less severe as the energy Ej received on the direct 
path increases with respect to the energy E m received via the diffuse multipaths, i.e., with 
growing K. Hence for Rician fading channels with strong line of sight reception, parallel 
transitions become feasible. For small values of the Rice factor K, i.e., K < Es/N 0 , (43) 
and (44) can be approximated as .. 


Pc 

Pi 


C(SLrX) 


4iVo(l + K)e~ K 
(£r ~ z() 2 


n <?(*.*) 


r=l 


(4iVo(l + K)e~ K ) 1 ' 

flr=l (z r -4) 2 


(45) 

(46) 


It becomes evident then that for small values of the Rice factor I \ , the effective length 
l' m is once again the dominant code design criterion. 


3.6 Distance Spectrum of Fading Channel Codes 

Using the approximations (37) or (46) for the Chernoff factors C(x r ,x(.), the expression 
for the transfer function bound (31) can be written in the form 

P.<Y, A '* P '*< ' (47) 

!',d> 
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where is the average number of code sequence pairs x, x' with effective length V 

and product distance d\ Pi> 4 < is the event error probability bound for an event error 
with effective length V and product distance d', and the average is taken over all code 
sequences x in the code. The parameter Aji^/ is called the average multiplicity of all 
code sequence pairs with event error probability bound Pi> ^ . Note that in (47) the error 
bound Pi< 4 ’ includes both terms of the form (32) (parallel transition error events) and 
terms of the form (33) (length / > 1 error events). 

For fading channels a spectral line is defined by an effective length a product 
distance d r , and an average multiplicity The set of all spectral lines of a code is 

called the distance spectrum of that code. In [15], we present an algorithm that computes 
the distance spectrum for quasi-regular codes, a general class of codes to which ail the best 
known trellis codes belong. This algorithm has been adapted to compute the distance 
spectrum of codes for fading channels. Figure 6 shows the distance spectrum of the 16- 
state 8-PSK from Table 2 (code f4), where the factors in the product distances d' have 
been normalized by the square root of the signal energy \/Es • The effective length of this 
code is l' m — 3, its minimum squared product distance is d 2 — 4.68, and its multiplicity 
is A. 3 , 4.68 = 2. Note that Figure 6 is divided into different diagrams for different effective 
lengths /'. The distance spectrum is used in section 4 to evaluate the performance of 
codes on fading channels. 

3.7 Block Code Performance on a Quantized Fading Channel 

In this section we discuss the use of block codes as an alternative to trellis codes on 
fading channels. The received 2-dimensional signal y is mapped into a discrete output 
alphabet, as illustrated in Figure 7 for an 8-PSK signal set. For MPSK signaling, each 
received signal y is decoded into one of M angular sections, called decision regions, and 
denoted by A,-. Because of the high probability of a deep fade, i.e., the reception of a 
signal with a very low amplitude 6, we have introduced a circular erasure region with 
radius p. This is done with the idea of using Reed-Solomon block codes which can handle 
erasures. In conjunction with interleaving, this transforms the fading channel into a 
discrete memoryless erasure channel. 

Let qij be the conditional probability that the received signal y lies in A j, given that 
the signal x,- was sent, and let q e be the probability that y falls into A e , i.e., 
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Effective Length=3 


H 1 1 1 P 

15 18 21 24 27 


product 

distance 


Effective Length=4 

, product 

~~ i ' I I I i * distance 

15 18 21 24 27 

. > ( 12 ) 

Effective Length=5 


— i- 1 — i — — i r 

15 18 21 24 27 


product 

distance 


coded 8-PSK modulation (Code f4). 
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( 48 ) 


qij = / Pr(y | xS)dy = — / e "o .£?/ e 2 6 e ^Idy 
JAi ~ ~ 7riV 0 Ja; b ^ J ~ 

and 

9e = l“S?0 - - ( 49 ) 



Figure 7: Decision regions for an 8-PSK signal set with an erasure region. 

-As an example, we will discuss the performance of an RS(63,42) block code using 
this quantized 8-PSK channel. The symbols of the RS(63,42) block code axe 6 bits long, 
i.e., two concatenated 8-PSK signals. This code has a symbol rate k/n = 42/63, which 
translates into a bit rate of 2 bits/signal, i.e., the same transmission rate as uncoded 
QPSK, the reference transmission system. The receiver operates such that if any one of 
the two received vectors y_ v y 2 belonging to the same RS-symbol is decoded into A e , the 
whole RS-symbol is erased. 

Using the Berlekemp- Massey decoding algorithm, the RS-decoder can correct any 
combination of t errors and e erasures as long as e + 2f < n — k, where n — k is the 
number of parity symbols. For this particular code, the symbol erasure probability P A is 
given by P A = q e q e -\-2q e (l—q e ), the probability of receiving a correct symbol is P c = qooqoo, 
and the probability of receiving a symbol, in error is given by P,- = 1 — P c — P A . 
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The block error probability Pb of such an RS-code may then be computed as 


Pb 



n — t 

p t £ 

e—n— k+\—2t 


n — t 
e 


ne pn—t—e 
- r A' r c 


+ £ f ? ) p?(i - p<r~'- (so) 

Figure 8 compares the performance of this bandwidth efficient RS-block code to uncoded 
QPSK. The erasure threshold radius p has been optimized for each value of Es/N 0 . 


4 Comparison of Code Performance 

In this section we present performance curves on the event error probability P e of several 
TCM schemes. The 8-PSK trellis codes introduced in Section 3 are quasi-regular 3 , and 
we have used a variant of the algorithm reported in [15] to evaluate P e . The two code 
word error probability bound in (10) is given by 

i 

P(x — > x') < min C(x, x', A) = min nC(l..x(,A). (51) 

A A r=:1 

where the tightest bound is obtained by individually minimizing (51) over A for each path 
pair x, x'. In view of the large number of code sequence pairs x, x', this is computationally 
unfeasible. However, any value of A may be used in (51) to obtain a looser bound. We 
choose A = A/Jq, i.e., the value of A which maximizes Ro, and the two code word bound 
of (51) becomes equivalent to the terms P t in the event error probability bound of (31), 
i.e., 

P e < J2 A t p t, where P t = {[ C(x r X)- (52) 

t r= 1 

The algorithm in [15] is a stack type algorithm which successively searches code se- 
quence pairs in increasing order of a metric associated with those code sequence pairs. 
This metric is additive over the individual signal pairs in the code sequences, and in [15] 
it is the squared Euclidean distance between those signal pairs. For A = A^, if we write 

3 For a precise definition of quasi-regularity, the reader is referred to [15]. 
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( 53 ) 


T=\ 

and choose — In C(£ r ,2L^) as the branch metric, the algorithm can be used in a slightly 
altered form to calculate the event error probability of any quasi-regular TCM scheme. 

In general, P t = nt=i C(s. T ,^r) will depend on Es/N 0 as well as on the specific signals 
x. r and x! r , 1 < r < /, and we must run the algorithm separately for each value of the SNR. 
If, however, the Chemoff factors are accurately approximated by (37) or (46), where they 
depend only on Es/N 0 , and d! (fading channels with side information), the distance 

spectrum can be used to evaluate P e for all values of E$/N 0 and the algorithm is used 
only once per code to calculate the distance spectrum. In this case the distance spectrum 
gives a good measure of code performance. As shown in the following figures codes with a 
good distance spectrum usually also perform well on channels where the approximations 
(37) or (46) may not be tight (fading channels with no side information, Rician channels 
with a large Rice factor). 

Figures 9 to 13 show the event error probability bound of some selected codes, where 
we have used (53) to evaluate P e . Figure 9 shows the performance on a Rayleigh channel 
with side information. The superiority of the new codes presented in Table 2 is evident. 
At an error probability level of P e = 10~ 5 , for example, the code f8 shows a 3.5dB 
improvement over the code g8. At P e = 10 -6 , the difference is 4.5dB. Note further that 
the asymptotic behavior of the Gaussian codes g8 and g6 are identical and it is therefore 
useless to employ the higher complexity of code g8. 

Figure 10 shows the same set of codes on a Rician channel with K = 7 dB, a typical 
value of the Rice factor. Again the fading codes from Table 2 outperform the Gaussian 
codes form Table 1 of the same complexity. The code f8, for example, achieves an error 
probability of P e = 10~ 6 at an Es/No of 12.2 dB, while the code g8 achieves the same 
error performance at 13.7 dB. At P e = 10 -7 , the gain of f8 over g8 is 2.5 dB. 

Figure 11 shows the same set of codes on a K = 15 dB Rician channel. This value of 
K indicates strong line of sight reception. In this environment, the Gaussian codes fare 
slightly better due to their superior minimum free squared Euclidean distance. f8 and g8 
have almost identical error performance, while f6 loses 1.2 dB compared to g6. 

Figures 12 and 13 show the same set of codes on a Rayleigh fading channel and a Rician 
channel with K = IdB and without side information. - While the relative performance 
of the codes with respect to each other is preserved, the lack of side information causes 
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a rather severe degradation in performance, particularly for the Rayleigh channel. At 
an error probability of P e = 10 -5 , for instance, f8 gains 3 dB over g8 on the Rayleigh 
channel and gains 2 dB on the K = 7 dB Rician channel. It is fortunate that for the 
Rayleigh channel side information can be extracted relatively easily from the receiver by 
monitoring a pilot tone, or in the case of constant envelope signaling by simply estimating 
the received signal amplitude. 

Comparing the performance of the RS-block code to TCM, we see that the RS-code is 
very poor for Es/N 0 < 20 dB. The error curves for the RS-code have a rather sharp cutoff 
and catch up with the trellis codes at an Es/N 0 of about 30 dB for the Rayleigh channel 
with side information. With no side information, the RS-code performance curve crosses 
the f8 performance curve at Es/Nn ~ 25dB. This behavior is typical when comparing 
the performance of block codes and trellis codes. 

5 Conclusions 

We have presented a general method of bounding the event error probability of TCM 
schemes and applied this method to the fading channel. We have shown that the effective 
length and the minimum squared product distance replace the minimum free squared 
Euclidean distance as a design criterion for Rayleigh fading channels and Rician fading 
channels with a substantial multipath component. We have presented codes specifically 
constructed for fading channels that outperform equivalent codes designed for the AWGN- 
channel. The use of RS-block codes with expanded signal sets becomes interesting only 
for large SNR’s, where they begin to outperform trellis codes. 
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Error Event Probability 



Figure 10: Performance of codes on a Rician channel with K = 7 dB and with side 
information. 


29 


Error Event Probability 



Figure 11: Performance of trellis codes on a Rician channel with K = 15 dB and with 
side information. 
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Error Event Probability 



Figure 13: Performance of trellis codes on a Rician channel with K — IdB and without 
side information. 
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